WO2021057027A1 - 人体检测方法、装置、计算机设备及存储介质 - Google Patents

人体检测方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021057027A1
WO2021057027A1 PCT/CN2020/087826 CN2020087826W WO2021057027A1 WO 2021057027 A1 WO2021057027 A1 WO 2021057027A1 CN 2020087826 W CN2020087826 W CN 2020087826W WO 2021057027 A1 WO2021057027 A1 WO 2021057027A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature matrix
contour
feature
target
bone
Prior art date
Application number
PCT/CN2020/087826
Other languages
English (en)
French (fr)
Inventor
段浩东
刘文韬
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to AU2020335016A priority Critical patent/AU2020335016A1/en
Priority to KR1020207037358A priority patent/KR20210038436A/ko
Priority to JP2020572391A priority patent/JP7101829B2/ja
Priority to SG11202101794SA priority patent/SG11202101794SA/en
Priority to EP20853555.9A priority patent/EP3828765A4/en
Priority to US17/181,376 priority patent/US20210174074A1/en
Publication of WO2021057027A1 publication Critical patent/WO2021057027A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • G06V2201/033Recognition of patterns in medical or anatomical images of skeletal patterns

Definitions

  • the present disclosure relates to the field of image processing technology, and in particular, to a human body detection method, device, computer equipment, and storage medium.
  • neural networks In image, video, voice, text and other fields, users have higher and higher requirements for the accuracy of various models based on neural networks.
  • Human detection in images is an important application scenario of neural networks, which requires high precision in human detection and computational data.
  • the purpose of the embodiments of the present disclosure is to provide a human body detection method, device, computer equipment, and storage medium.
  • embodiments of the present disclosure provide a human body detection method, including: acquiring an image to be detected; determining position information of key bone points used to characterize the bone structure of the human body based on the image to be detected; The position information of the contour key points of the contour; based on the position information of the bone key points and the position information of the contour key points, a human body detection result is generated.
  • the embodiments of the present disclosure can determine the position information of the key points of the skeleton used to characterize the human bone structure and the position information of the key points of the outline used to characterize the outline of the human body from the image to be detected, and are based on the position information of the key points of the bone, and The position information of the key points of the contour generates the human body detection result, which improves the precision of the representation while taking into account the amount of calculation data.
  • the information characterizing the human body is richer and has a broader Application scenarios, such as image editing, body shape adjustment, etc.
  • the contour key points include a main contour key point and an auxiliary contour key point; wherein there is at least one auxiliary contour key point between two adjacent main contour key points.
  • the human body contour is characterized by the position information of the main contour key points and the position information of the auxiliary contour key points, so that the identification of the human body contour is more accurate and the amount of information is richer.
  • determining the position information of key contour points used to characterize the contour of the human body includes: determining the position information of the main contour key points based on the image to be detected; The position information of the key points of the main contour determines human body contour information; and the position information of multiple key points of the auxiliary contour is determined based on the determined human contour information.
  • the position information of the main contour key points and the position information of the auxiliary contour key points can be located more accurately.
  • the human body detection result includes one or more of the following: an image to be detected with bone key point markers and contour key point markers added; including position information of the bone key points and all A data group describing the location information of key points of the outline.
  • the to-be-detected image including the bone key point markers and the contour key point markers can give a more intuitive visual impression; the data group including the position information of the bone key points and the position information of the contour key points is more Easy to follow up.
  • the method further includes: performing one or more of the following operations based on the human body detection result: human body motion recognition, human body posture detection, human body contour adjustment, human body image editing, and human body mapping .
  • the determining, based on the image to be detected, the position information of the key points of the skeleton used to characterize the human skeletal structure and the position information of the key points of the outline used to characterize the contour of the human body includes: In the image to be detected, feature extraction is performed to obtain bone features and contour features, and the obtained bone features and contour features are feature-fused; based on the feature fusion result, the position information of the bone key points and the contour key points are determined Location information.
  • feature extraction can be performed on the image to be detected to obtain bone features and contour features, and the obtained bone features and contour features can be feature-fused to obtain position information of bone key points used to characterize the bone structure of the human body, and It is used to characterize the position information of the key points of the outline of the human body.
  • the human body detection result obtained based on this method can not only represent the human body with a smaller amount of data, but also extract the bone features and contour features of the human body to represent the human body, taking into account the improvement of the fineness of the representation.
  • the performing feature extraction based on the image to be detected to obtain bone features and contour features, and performing feature fusion of the obtained bone features and contour features includes: based on the image to be detected, Perform feature extraction at least once, and perform feature fusion on the bone features and contour features obtained from each feature extraction.
  • i is a positive integer
  • the determination based on the result of feature fusion to determine the position information of bone key points used to characterize the human skeletal structure and the position information of the contour key points used to characterize the contour of the human body includes: based on the last time
  • the feature fusion result of feature fusion determines the position information of the bone key points and the position information of the contour key points.
  • At least one feature extraction is performed on the image to be detected, and the bone features and contour features obtained from each feature extraction are feature-fused, so that the bone feature points and the contour feature points that have a position association relationship can be corrected each other.
  • the position information of the bone key points and the position information of the contour key points finally obtained have higher accuracy.
  • the performing at least one feature extraction based on the image to be detected includes: in the first feature extraction, using a pre-trained first feature extraction network to extract from the image to be detected for The first target skeleton feature matrix of the key points of the skeleton that characterizes the human bone features; and the first target contour feature matrix of the contour key points used to characterize the outline of the human body is extracted; in the i+1th feature extraction, the pre-trained From the feature fusion result of the i-th feature fusion, the second feature extraction network extracts the first target bone feature matrix of the bone key points used to characterize the human bone features; and extracts the first target bone feature matrix of the contour key points used to characterize the contour feature of the human body A target contour feature matrix; wherein the network parameters of the first feature extraction network and the second feature extraction network are different, and the network parameters of the second feature extraction network used for different times of feature extraction are different.
  • the bone feature and the contour feature are extracted and merged at least once, and the position information of the bone key points and the position information of the contour key points are finally obtained with higher accuracy.
  • performing feature fusion on the extracted bone features and contour features includes: using a pre-trained feature fusion neural network to perform the feature fusion of the first target bone feature matrix and the first target contour feature matrix Feature fusion is performed to obtain a second target skeleton feature matrix and a second target contour feature matrix; wherein, the second target skeleton feature matrix is a three-dimensional bone feature matrix, and the three-dimensional bone feature matrix includes two corresponding bone key points.
  • the bone features and contour features are fused based on the pre-trained feature fusion network, which can obtain better feature fusion results, so that the position information of the bone key points and the position information of the contour key points obtained finally have more information. High precision.
  • the determining the position information of the bone key points and the position information of the contour key points based on the feature fusion result of the last feature fusion includes: the first feature fusion obtained based on the last feature fusion Two target skeleton feature matrices, determining the position information of the skeleton key points; and determining the position information of the contour key points based on the second target contour feature matrix obtained from the last feature fusion.
  • the position information of the bone key points and the position information of the contour key points finally obtained have higher accuracy.
  • the first feature extraction network includes: a common feature extraction network, a first bone feature extraction network, and a first contour feature extraction network; the first feature extraction network is used to extract from the image to be detected for And extracting the first target contour feature matrix of the contour key points used to characterize the contour features of the human body, including: using the common feature extraction network to perform the comparison on the to-be-detected
  • the image is subjected to convolution processing to obtain a basic feature matrix containing bone features and contour features;
  • the first bone feature extraction network is used to perform convolution processing on the basic feature matrix to obtain a first bone feature matrix, and from the first bone feature matrix
  • the first target convolutional layer in a skeleton feature extraction network obtains a second skeleton feature matrix; based on the first skeleton feature matrix and the second skeleton feature matrix, the first target skeleton feature matrix is obtained;
  • a target convolutional layer is any convolutional layer except the last convolutional layer in the first bone feature extraction network;
  • the first contour feature extraction network is
  • the common feature extraction network is used to extract bone features and contour features, and other features other than the bone features and contour features in the image to be detected are removed, and then the first bone feature extraction network is used to extract the bone features in a targeted manner.
  • Using the first contour feature extraction network to perform targeted extraction of contour features requires less calculation.
  • the obtaining the first target bone feature matrix based on the first bone feature matrix and the second bone feature matrix includes: combining the first bone feature matrix and the The second bone feature matrix is spliced to obtain a first spliced bone feature matrix; the first spliced bone feature matrix is subjected to dimensional transformation processing to obtain the first target bone feature matrix; the first contour feature is based on Matrix and the second contour feature matrix to obtain the first target contour feature matrix, including: performing splicing processing on the first contour feature matrix and the second contour feature matrix to obtain a first spliced contour feature matrix; Perform dimensional transformation processing on the first spliced contour feature matrix to obtain the first target contour feature matrix; wherein the dimension of the first target skeleton feature matrix is the same as the dimension of the first target contour feature matrix, and The first target skeleton feature matrix and the first target contour feature matrix have the same dimension in the same dimension.
  • the first skeletal feature matrix and the second skeletal feature matrix are spliced, so that the first target skeletal feature matrix has more abundant skeletal feature information; at the same time, the first contour feature matrix and the second skeleton feature matrix are combined.
  • the two contour feature matrices are spliced, so that the contour feature matrix of the first target has richer bone feature information.
  • the position information of the bone key points and the contour key points can be extracted with higher precision Location information.
  • the feature fusion neural network includes: a first convolutional neural network, a second convolutional neural network, a first transformation neural network, and a second transformation neural network; the use of the feature fusion neural network Performing feature fusion on the first target skeleton feature matrix and the first target contour feature matrix to obtain a second target skeleton feature matrix and a second target contour feature matrix includes: using the first convolutional neural network to pair Performing convolution processing on the first target skeleton feature matrix to obtain a first intermediate skeleton feature matrix; and using the second convolutional neural network to perform convolution processing on the first target contour feature matrix to obtain a first intermediate contour Feature matrix; the first intermediate contour feature matrix and the first target bone feature matrix are spliced to obtain a first spliced feature matrix; and the first spliced feature matrix is performed using the first transformation neural network Dimensional transformation to obtain the second target bone feature matrix; splicing the first intermediate bone feature matrix and the first target contour feature matrix to obtain a second splicing feature matrix
  • the first intermediate contour feature matrix and the first target bone feature matrix are spliced, and the second target bone feature matrix is obtained based on the splicing processing result. Fusion to achieve correction using the bone features obtained by contour feature extraction.
  • the skeletal feature and the contour feature are merged, In order to achieve the use of bone features to correct the extracted contour features. Furthermore, the position information of the bone key points and the position information of the contour key points can be extracted with higher accuracy.
  • the feature fusion neural network includes: a first directional convolutional neural network, a second directional convolutional neural network, a third convolutional neural network, a fourth convolutional neural network, and a third transform Neural network and a fourth transformation neural network; the feature fusion neural network is used to perform feature fusion on the first target skeleton feature matrix and the first target contour feature matrix to obtain a second target skeleton feature matrix and a second
  • the target contour feature matrix includes: using the first directional convolutional neural network to perform directional convolution processing on the first target skeleton feature matrix to obtain a first directional skeleton feature matrix; and using a third convolutional neural network Performing convolution processing on the first directional bone feature matrix to obtain a second intermediate bone feature matrix; and using the second directional convolutional neural network to perform directional convolution processing on the first target contour feature matrix to obtain A first directional contour feature matrix; and use a fourth convolutional neural network to perform convolution processing on the first directional contour feature matrix to obtain a second
  • the feature is fused by means of directional convolution, and the position information of the bone key points and the position information of the contour key points can be extracted with higher accuracy.
  • the feature fusion neural network includes: a displacement estimation neural network and a fifth transformation neural network; the use of the feature fusion neural network to compare the first target skeleton feature matrix and the first target The contour feature matrix performs feature fusion to obtain the second target skeleton feature matrix and the second target contour feature matrix, including: performing splicing processing on the first target bone feature matrix and the first target contour feature matrix to obtain the fifth splicing Feature matrix; input the fifth splicing feature matrix into the displacement estimation neural network, and perform displacement estimation on multiple sets of key point pairs determined in advance to obtain that one key point of each key point pair moves to another key Point displacement information; each key point in each group of key point pairs is used as the current key point, and the other key point paired with the current key point is obtained from the three-dimensional feature matrix corresponding to the other key point.
  • the two-dimensional feature matrix corresponding to the key point according to the displacement information from the other key point of the pair to the current key point, the position transformation of the element in the two-dimensional feature matrix corresponding to the other key point of the pair is performed ,
  • the displacement feature matrix corresponding to the current key point for each bone key point, the two-dimensional feature matrix corresponding to the bone key point is spliced with its corresponding displacement feature matrix to obtain the splicing of the bone key point Two-dimensional feature matrix; and input the spliced two-dimensional feature matrix of the bone key points to the fifth transformation neural network to obtain the target two-dimensional feature matrix corresponding to the bone key points; based on the target corresponding to each bone key point A two-dimensional feature matrix is used to generate the second target bone feature matrix; for each contour key point, the two-dimensional feature matrix corresponding to the contour key point is spliced with its corresponding displacement feature matrix to obtain the contour key point The spliced two-dimensional feature matrix of the contour key point; and input the splice
  • the feature fusion is realized by performing displacement transformation on the bone key points and the contour key points, and the position information of the bone key points and the position information of the contour key points can be extracted with higher accuracy.
  • the human body detection method is implemented by a human body detection model; the human body detection model includes: the first feature extraction network and/or the feature fusion neural network: the human body detection model is used
  • the sample images in the training sample set are obtained through training, and the sample images are annotated with actual position information of the bone key points of the human skeletal structure and actual position information of the contour key points of the contour of the human body.
  • the human body detection model obtained through the training method has a higher detection accuracy, and the human body detection model can obtain a human body detection result that takes into account the fineness of the representation and the amount of calculation data.
  • embodiments of the present disclosure also provide a human body detection device, including: an acquisition module for acquiring an image to be detected; a detection module for determining a bone key used to characterize a human skeletal structure based on the image to be detected The position information of the points and the position information of the contour key points used to characterize the contour of the human body; a generating module is used to generate the human body detection result based on the position information of the bone key points and the position information of the contour key points.
  • embodiments of the present disclosure also provide a computer device, including: a processor, a non-transitory storage medium, and a bus.
  • the non-transitory storage medium stores machine-readable instructions executable by the processor.
  • the processor and the storage medium communicate through a bus, and the machine-readable instruction executes the first aspect or any one of the first aspects when the machine-readable instruction is executed by the processor. Steps in one possible implementation.
  • embodiments of the present disclosure also provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and the computer program executes the first aspect or the first aspect when the computer program is run by a processor. Steps in any possible implementation.
  • the embodiments of the present disclosure can determine the position information of the key points of the skeleton used to characterize the human bone structure and the position information of the key points of the outline used to characterize the outline of the human body from the image to be detected, and are based on the position information of the key points of the bone, and The position information of the key points of the contour generates the human body detection result, which improves the precision of the representation while taking into account the amount of calculation data.
  • Fig. 1 shows a flowchart of a human body detection method provided by an embodiment of the present disclosure.
  • Fig. 2a shows an example of the positions of contour key points and bone key points provided by an embodiment of the present disclosure.
  • Fig. 2b shows an example of the positions of main contour key points and auxiliary contour key points provided by an embodiment of the present disclosure.
  • Fig. 2c shows another example of the positions of the main contour key points and the auxiliary contour key points provided by the embodiments of the present disclosure.
  • Fig. 2d shows another example of the positions of the main contour key points and the auxiliary contour key points provided by the embodiments of the present disclosure.
  • Fig. 3 shows a schematic structural diagram of a first feature extraction network provided by an embodiment of the present disclosure.
  • Fig. 4 shows a flowchart of a feature extraction method provided by an embodiment of the present disclosure.
  • Fig. 5 shows a schematic structural diagram of a feature fusion network provided by an embodiment of the present disclosure.
  • Fig. 6 shows a flowchart of a feature fusion method provided by an embodiment of the present disclosure.
  • Fig. 7 shows a schematic structural diagram of another feature fusion network provided by an embodiment of the present disclosure.
  • Fig. 8 shows a flowchart of another feature fusion method provided by an embodiment of the present disclosure.
  • Fig. 9a shows a schematic diagram of an iterative update process using a scattering convolution operator provided by an embodiment of the present disclosure.
  • Fig. 9b shows a schematic diagram of an iterative update process using an aggregated convolution operator provided by an embodiment of the present disclosure.
  • FIG. 10 shows a schematic structural diagram of another feature fusion network provided by an embodiment of the present disclosure.
  • Fig. 11 shows a flowchart of another feature fusion method provided by an embodiment of the present disclosure.
  • Fig. 12 shows examples of bone key points and contour key points provided by an embodiment of the present disclosure.
  • FIG. 13 shows a specific example of performing displacement transformation on elements in a two-dimensional feature matrix provided by an embodiment of the present disclosure.
  • FIG. 14 shows a schematic structural diagram of a second feature extraction network provided by an embodiment of the present disclosure.
  • FIG. 15 shows a schematic diagram of a human body detection device provided by an embodiment of the present disclosure.
  • Fig. 16 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
  • Skeleton key point detection method in this method, the human bone key points are extracted from the image through the neural network model, and the corresponding human body detection results are obtained based on the bone key points; in this human body detection method, it uses simple
  • the human body representation method has a smaller amount of data, so when other subsequent processing is performed based on the human body detection results obtained by this method, the amount of calculation required is less; it is more used for the posture and posture of the human body.
  • Action recognition and other fields such as behavior detection, human-computer interaction based on human posture, etc.
  • this method cannot extract the contour information of the human body, the resulting human body detection result has a low degree of characterization.
  • Semantic segmentation method In this method, the probability that each pixel in the image belongs to the human body is recognized through the semantic segmentation model, and the human body detection result is obtained based on the probability that each pixel in the image belongs to the human body; in this human body detection method, Although the contour information of the human body can be obtained completely, the amount of calculation data contained in the human body recognition result is relatively large.
  • the present disclosure provides a human body detection method, device, computer equipment, and storage medium, which can perform feature extraction on the image to be detected to extract the bone features and contour features of the human body, and extract the bone features and contour features obtained.
  • the feature fusion is performed to obtain the position information of the key points of the bones used to characterize the skeleton structure of the human body, and the position information of the key points of the outline used to characterize the outline of the human body.
  • the human body detection result obtained based on this method has less data volume, and reflects the skeletal characteristics and contour characteristics of the human body, and takes into account the improvement of the fineness of the representation.
  • the information characterizing the human body is richer and has a broader Application scenarios.
  • the human body detection method can be applied to any device with data processing capability, such as a computer.
  • FIG. 1 is a flowchart of a human body detection method provided by an embodiment of the present disclosure, in which:
  • S102 Based on the image to be detected, determine the position information of the bone key points used to characterize the human skeletal structure and the position information of the contour key points used to characterize the contour of the human body.
  • S103 Generate a human body detection result based on the position information of the bone key points and the position information of the contour key points.
  • the image to be detected may be, for example, the image to be detected taken by a camera installed at the target location, the image to be detected sent by other computer equipment, and the pre-saved image to be detected read from the local database Wait.
  • the image to be detected may or may not include a human body image; if the image to be detected includes a human body image, the final human body detection result can be obtained based on the human body detection method provided by the embodiment of the present disclosure; if the image to be detected is If the human body image is not included, the human body detection result obtained is, for example, empty.
  • the bone key points can be used to characterize the bone features of the human body, and the bone features include the characteristics of the joint parts of the human body.
  • the joints are, for example, elbow joints, wrist joints, shoulder joints, neck joints, hip joints, knee joints, ankle joints, and the like.
  • bone key points can also be set on the human head.
  • Contour key points can be used to characterize the contour features of the human body, which can include: main contour key points, as shown in Fig. 2a, or include: main contour key points and auxiliary contour key points, as shown in Figs. 2b to 2d; , Figures 2b to 2d are partial views of the parts within the line frame in Figure 2a.
  • the main contour key points are the contour key points that characterize the contours of the human body joints, as shown in Figure 2a, such as the contours of the elbow joint, the contour of the wrist joint, the contour of the shoulder joint, the contour of the neck joint, the contour of the hip joint, and the knee joint.
  • the contours of the joints, the contours of the ankle joints, etc. generally appear corresponding to key points of the bones that characterize the corresponding joints.
  • Auxiliary contour key points are contour key points that characterize the contours of the joints of the human body.
  • bone key points and outline key points involved in the above drawings and text descriptions are only examples to facilitate the understanding of the present disclosure.
  • the number and positions of bone key points and contour key points can be appropriately adjusted according to the actual scene, which is not limited in the present disclosure.
  • the contour key points include the main contour key points and the auxiliary contour key points
  • the following methods can be used to determine the position information of the contour key points used to characterize the contour of the human body based on the image to be detected:
  • the position information of the main contour key points is determined; the human contour information is determined based on the position information of the main contour key points; the position information of multiple auxiliary contour key points is determined based on the determined human contour information.
  • the position information of the main contour key points can be determined directly based on the image to be detected.
  • feature extraction is performed to obtain bone features and contour features, and the obtained bone features and contour features are feature fused; based on the feature fusion result, the position information of the bone key points and the position information of the contour key points are determined.
  • the bone feature and contour feature extraction can be performed but not limited to any of the following A or B.
  • A Perform a feature extraction on the image to be detected, and perform feature fusion on the bone features and contour features obtained from the feature extraction.
  • the following describes the feature extraction process and feature fusion process in a1 and a2 respectively.
  • the pre-trained first feature extraction network can be used to extract the first target bone feature matrix of the bone key points used to characterize the human bone features from the image to be detected; and extract the first target used to characterize the contour key points of the human body contour feature Contour feature matrix.
  • an embodiment of the present disclosure provides a schematic structural diagram of a first feature extraction network.
  • the first feature extraction network includes: a common feature extraction network, a first bone feature extraction network, and a first contour feature extraction network.
  • an embodiment of the present disclosure also provides a specific process of extracting a first target skeleton feature matrix and a first target contour feature matrix from the image to be detected based on the first feature extraction network provided in FIG. 3, including the following steps .
  • S401 Use the shared feature extraction network to perform convolution processing on the image to be detected to obtain a basic feature matrix including bone features and contour features.
  • the image to be detected can be represented as an image matrix; if the image to be detected is a single-color channel image, such as a grayscale image, it can be represented as a two-dimensional image matrix; each of the two-dimensional image matrix The element corresponds to the pixel of the image to be detected one-to-one; the value of each element in the two-dimensional image matrix is the pixel value of the pixel corresponding to each element. If the image to be detected is a multi-color channel image, such as an image in RGB format, it can be represented as a three-dimensional image matrix; the three-dimensional image matrix includes three channels with different colors (for example, R, G, B). A corresponding two-dimensional image matrix; the value of each element in any two-dimensional image matrix is the pixel value corresponding to each element and the pixel value under the corresponding color channel.
  • the common feature extraction network includes at least one convolutional layer; after the image matrix of the image to be detected is input to the common feature extraction network, the image matrix of the image to be detected is convolved using the common feature extraction network to extract the image to be detected Characteristics.
  • the extracted features include both bone features and contour features.
  • S402 Use the first skeletal feature extraction network to perform convolution processing on the basic feature matrix to obtain a first skeletal feature matrix, and obtain a second skeletal feature matrix from the first target convolution layer in the first skeletal feature extraction network; A skeleton feature matrix and a second skeleton feature matrix are used to obtain the first target skeleton feature matrix; the first target convolutional layer is any convolutional layer except the last convolutional layer in the first skeleton feature extraction network.
  • the first bone feature extraction network includes multiple convolutional layers.
  • the multiple convolutional layers are connected in sequence, and the input of the next convolutional layer is the output of the previous convolutional layer.
  • the first skeleton feature extraction network with this structure can perform multiple convolution processing on the basic feature matrix, and obtain the first skeleton feature matrix from the last convolution layer.
  • the first skeleton feature matrix is a three-dimensional feature matrix; the three-dimensional feature matrix includes a plurality of two-dimensional feature matrices, and each two-dimensional feature matrix corresponds to a plurality of predetermined bone key points in a one-to-one correspondence.
  • the value of an element in the two-dimensional feature matrix corresponding to a certain bone key point represents the probability that the pixel corresponding to the element belongs to the bone key point, and there are generally multiple pixels corresponding to one element.
  • the bone features of the human body can be extracted from the basic feature matrix, as the number of convolutions increases, some of the images to be detected will be lost Information, these information may also include the relevant information of the bone features of the human body; if too much information in the image to be detected is lost, it may result in the first target bone feature that is the key point of the bone that is used to characterize the bone feature of the human body.
  • the matrix is not precise enough. Therefore, in the embodiment of the present disclosure, the second skeletal feature matrix is also obtained from the first target convolutional layer of the first skeletal feature extraction network, and the first target is obtained based on the first skeletal feature matrix and the second skeletal feature matrix. Bone feature matrix.
  • the first target convolutional layer is any convolutional layer except the last convolutional layer in the first bone feature extraction network.
  • the penultimate convolutional layer in the first bone feature extraction network is selected as the first target convolutional layer.
  • the following method can be used to obtain the first target bone feature matrix based on the first bone feature matrix and the second bone feature matrix:
  • the first bone feature matrix and the second bone feature matrix are spliced to obtain the first spliced bone feature matrix; the first spliced bone feature matrix is subjected to dimensional transformation processing to obtain the first target bone feature matrix.
  • the dimensional transformation neural network is used to perform convolution processing on the first spliced bone feature matrix at least once to obtain the first spliced bone feature matrix.
  • a target bone feature matrix A target bone feature matrix.
  • the dimensional transformation neural network can fuse the feature information carried in the first skeleton feature matrix and the second skeleton feature matrix, so that the obtained first target skeleton feature matrix contains richer information.
  • S403 Use the first contour feature extraction network to perform convolution processing on the basic feature matrix to obtain a first contour feature matrix, and obtain a second contour feature matrix from the second target convolution layer in the first contour feature extraction network; based on The first contour feature matrix and the second contour feature matrix are used to obtain the first target contour feature matrix; the second target convolutional layer is any convolutional layer except the last convolutional layer in the first contour feature extraction network .
  • the penultimate convolutional layer in the first contour feature extraction network is selected as the second target convolutional layer.
  • the first contour feature extraction network also includes multiple convolutional layers.
  • the multiple convolutional layers are connected in sequence, and the input of the next convolutional layer is the output of the previous convolutional layer.
  • the first contour feature extraction network with this structure can perform multiple convolution processing on the basic feature matrix, and obtain the first contour feature matrix from the last convolution layer.
  • the first contour feature matrix is a three-dimensional feature matrix; the three-dimensional feature matrix includes a plurality of two-dimensional feature matrices, and each two-dimensional feature matrix corresponds to a plurality of predetermined contour key points in a one-to-one correspondence.
  • the value of an element in the two-dimensional feature matrix corresponding to a certain contour key point represents the probability that the pixel point corresponding to the element belongs to the contour key point, and there are generally multiple pixels corresponding to one element.
  • the number of contour key points and the number of bone key points are generally different. Therefore, the number of two-dimensional feature matrices included in the obtained first contour feature matrix is different from that included in the first bone feature matrix. The number of two-dimensional feature matrices can be different.
  • the number of bone key points is 14 and the number of contour key points is 25, the number of two-dimensional feature matrices included in the first contour feature matrix is 25, and the two-dimensional feature matrix included in the first bone feature matrix is 25.
  • the number of feature matrices is 14.
  • the second contour feature matrix can be obtained from the second target convolutional layer in the first contour feature extraction network in a similar manner to the above S402. , And then obtain the first target contour characteristic matrix based on the first contour characteristic matrix and the second contour characteristic matrix.
  • the manner of obtaining the first target contour feature matrix includes, for example:
  • the first contour feature matrix and the second contour feature matrix are spliced to obtain the first spliced contour feature matrix; the first spliced contour feature matrix is subjected to dimensional transformation processing to obtain the first target contour feature matrix.
  • the dimension of the first target skeleton feature matrix is the same as the dimension of the first target contour feature matrix, and the first target skeleton feature matrix and the first target contour feature matrix are in the same dimension.
  • the numbers are the same, so that subsequent feature fusion processing is performed based on the first target skeleton feature matrix and the first target contour feature matrix.
  • the dimension of the first target skeleton feature matrix is 3, and the dimensions of each dimension are 64, 32, and 14, respectively, then the dimension of the first target skeleton feature matrix is expressed as 64*32*14; the first target The dimension of the contour feature matrix can also be expressed as 64*32*14.
  • the first target skeleton feature matrix and the first target contour feature matrix can also be obtained in the following manner:
  • the first skeleton feature extraction network uses the first skeleton feature extraction network to perform convolution processing on the basic feature matrix to obtain the first skeleton feature matrix, and perform dimensional transformation processing on the first skeleton feature matrix to obtain the first target skeleton feature matrix;
  • the first contour feature extraction network uses the first contour feature extraction network to perform convolution processing on the basic feature matrix to obtain a first contour feature matrix, and perform dimensional transformation processing on the first contour feature matrix to obtain a first target contour feature matrix.
  • the bone features and contour features of the human body can also be extracted from the image to be detected with higher accuracy.
  • the first feature extraction network provided in the embodiments of the present disclosure is obtained by pre-training.
  • the human body detection method provided by the embodiments of the present disclosure is implemented by a human body detection model;
  • the human body detection model includes: a first feature extraction network and/or a feature fusion neural network;
  • the human body detection model is obtained by training using sample images in the training sample set, and the sample images are marked with actual position information of the key points of the human skeleton and the actual position information of the key points of the outline of the human body.
  • the first feature extraction network may be trained separately, or may be jointly trained with the feature fusion neural network, or may be combined with separate training and joint training.
  • the process of training to obtain the first feature extraction network includes but is not limited to the following (1) and (2).
  • the individual training of the first feature extraction network includes, for example:
  • Step 1.1 Obtain multiple sample images and the annotation data of each sample image;
  • the annotation data includes: the actual position information of the bone key points used to characterize the human skeletal structure, and the actual position of the outline key points used to characterize the outline of the human body information;
  • Step 1.2 Input multiple sample images into the first basic feature extraction network to obtain the first sample target bone feature matrix and the first sample target contour feature matrix;
  • Step 1.3 Determine the first predicted position information of the skeleton key points based on the first sample target skeleton feature matrix; and determine the first predicted position information of the contour key points based on the first sample target contour feature matrix;
  • Step 1.4 Determine the first loss based on the actual position information of the bone key points and the first predicted position information of the bone key points; and determine the first loss based on the actual position information of the contour key points and the first predicted position information of the contour key points Second loss
  • Step 1.5 Perform this round of training on the first basic feature extraction network based on the first loss and the second loss;
  • the first feature extraction network is obtained.
  • the first loss is LS1 in Figure 3; the second loss is LC1 in Figure 3. Based on the first loss and the second loss, supervise the training of the first basic feature extraction network to obtain the first feature extraction network with higher accuracy.
  • Step 2.1 Obtain multiple sample images and the annotation data of each sample image;
  • the annotation data includes: the actual position information of the bone key points used to characterize the human skeletal structure, and the actual position of the outline key points used to characterize the human body contour information;
  • Step 2.2 Input multiple sample images into the first basic feature extraction network to obtain the first sample target bone feature matrix and the first sample target contour feature matrix;
  • Step 2.3 Use the basic feature fusion neural network to perform feature fusion on the first sample target skeleton feature matrix and the first sample target contour feature matrix to obtain the second sample target skeleton feature matrix and the second sample target contour feature matrix.
  • Step 2.4 Determine the second predicted position information of the skeleton key points based on the second sample target skeleton feature matrix; and determine the second predicted position information of the contour key points based on the second sample target contour feature matrix;
  • Step 2.5 Determine the third loss based on the actual position information of the bone key points and the second predicted position information of the bone key points; and determine the third loss based on the actual position information of the contour key points and the second predicted position information of the contour key points Fourth loss
  • Step 2.6 Based on the third loss and the fourth loss, perform this round of training on the first basic feature extraction network and the basic feature fusion neural network;
  • the first feature extraction network and the feature fusion neural network are obtained.
  • the process in (1) can be used to pre-train the first feature extraction network; the first feature extraction network obtained after the pre-training is combined with the feature fusion neural network to perform the joint training in (2) above.
  • sample images used for the individual training and joint training of the first feature extraction network may be the same or different.
  • the feature fusion neural network Before the joint training of the first feature extraction network and the feature fusion neural network, the feature fusion neural network can also be pre-trained, and then the pre-trained feature fusion neural network can be used for joint training with the first feature extraction network .
  • the first target bone feature matrix of the bone key points used to characterize the human bone features and the first target contour feature matrix of the contour key points used to characterize the human body contour features, it can be based on the first target bone feature moments and The first target contour feature matrix performs feature fusion processing.
  • the first skeletal feature extraction network extracts skeletal features from the basic feature matrix
  • the first contour The feature extraction network extracts contour features from the basic feature matrix.
  • the two processes exist independently of each other. But for the same human body, there is a correlation between contour features and bone features; the purpose of fusing contour features and bone features is to use the mutual influence relationship between bone features and contour features.
  • the position information of the final extracted key points of the skeleton can be corrected based on the contour features, and the position information of the final extracted key points of the contour can be corrected based on the bone features, so as to obtain a more accurate position of the bone key points Information and the location information of key points of the contour to obtain a higher-precision human body detection result.
  • the embodiments of the present disclosure provide a specific method for feature fusion of extracted bone features and contour features, including: using a pre-trained feature fusion neural network to feature a first target bone feature matrix and a first target contour feature matrix Fusion, obtain the second target skeleton feature matrix and the second target contour feature matrix.
  • the second target bone feature matrix is a three-dimensional bone feature matrix, and the three-dimensional bone feature matrix includes a two-dimensional bone feature matrix corresponding to each key point of the bone; the value of each element in the two-dimensional bone feature matrix is related to the element
  • the corresponding pixel point belongs to the probability of the corresponding bone key point (that is, the bone key point corresponding to the two-dimensional bone feature matrix);
  • the second target contour feature matrix is a three-dimensional contour feature matrix, and the three-dimensional contour feature matrix includes the key points of each contour Corresponding two-dimensional contour feature matrix respectively; the value of each element in the two-dimensional contour feature matrix represents the probability that the pixel corresponding to the element belongs to the corresponding contour key point.
  • the feature fusion neural network provided in the embodiments of the present disclosure can be trained separately, can also be jointly trained with the first feature extraction network, or can be combined with separate training and joint training.
  • the process of feature fusion of bone features and contour features may include but is not limited to at least one of the following M1 to M3.
  • an embodiment of the present disclosure provides a specific structure of a feature fusion neural network, including: a first convolutional neural network, a second convolutional neural network, a first transformation neural network, and a second transformation neural network.
  • an embodiment of the present disclosure also provides a feature fusion neural network based on the feature fusion network provided in FIG. 5 to perform feature fusion on a first target skeleton feature matrix and a first target contour feature matrix to obtain a second target skeleton feature matrix
  • the specific method of the second target contour feature matrix includes the following steps.
  • S601 Use the first convolutional neural network to perform convolution processing on the first target skeleton feature matrix to obtain the first intermediate skeleton feature matrix. Go to S603.
  • the first convolutional neural network includes at least one convolutional layer. If the first convolutional neural network has multiple layers, the multiple convolutional layers are connected in turn; the input of the convolutional layer of this layer is the output of the previous convolutional layer.
  • the first target skeleton feature matrix is input to the first convolutional neural network, and each convolution layer is used to perform convolution processing on the first target skeleton feature matrix to obtain the first intermediate skeleton feature matrix.
  • This process is to be able to further extract the bone features from the first target bone feature matrix.
  • S602 Use the second convolutional neural network to perform convolution processing on the first target contour feature matrix to obtain a first intermediate contour feature matrix. Go to S604.
  • S603 Perform splicing processing on the first intermediate contour feature matrix and the first target bone feature matrix to obtain the first spliced feature matrix; and use the first transformation neural network to perform dimensional transformation on the first spliced feature matrix to obtain the second target bone feature matrix.
  • the first intermediate contour feature matrix and the first target bone feature matrix are spliced to obtain the first spliced feature matrix, so that the obtained first spliced feature matrix includes both the contour feature and the bone feature.
  • first transformation neural network to perform further dimensional transformation on the first splicing matrix is actually using the first transformation neural network to extract bone features from the first splicing feature matrix again; because of the process of obtaining the first splicing feature matrix, The features other than bone features and contour features in the image to be detected are removed, and only bone features and contour features are included. Therefore, based on the bone features contained in the second target bone feature matrix obtained from the first splicing feature matrix, the Affected by the contour feature, the relationship between the bone feature and the contour feature can be established, and the fusion of the bone feature and the contour feature can be realized.
  • S604 Perform splicing processing on the first intermediate skeleton feature matrix and the first target contour feature matrix to obtain a second spliced feature matrix, and use the second transform neural network to perform dimensional transformation on the second spliced feature matrix to obtain the second target contour feature matrix.
  • the process of splicing the first intermediate skeleton feature matrix and the first target contour feature matrix to obtain the second splicing feature matrix is similar to the process of obtaining the first splicing feature matrix in S602, and will not be repeated here.
  • the contour features contained in the contour feature matrix of the second target will be affected by the bone features, and the correlation between the bone features and the contour features is established, and the fusion of the bone features and the contour features is realized.
  • the feature fusion neural network can be individually trained in the following manner.
  • Step 3.1 Obtain the first sample target bone feature matrix and the first sample target contour feature matrix of multiple sample images.
  • the method of obtaining is similar to the method of obtaining the first target skeleton feature matrix and the first target contour feature matrix in the foregoing embodiment, and will not be repeated here. It can be acquired in the case of joint training with the first feature extraction network, or can be acquired by using a pre-trained first feature extraction network.
  • Step 3.2 Use the first basic convolutional neural network to perform convolution processing on the target bone feature matrix of the first sample to obtain the middle bone feature matrix of the first sample.
  • Step 3.3 Use the second basic convolutional neural network to perform convolution processing on the target contour feature matrix of the first sample to obtain the middle contour feature matrix of the first sample.
  • Step 3.4 Perform splicing processing on the middle contour feature matrix of the first sample and the target bone feature matrix of the first sample to obtain the spliced feature matrix of the first sample; and use the first basic transformation neural network to splice the feature matrix of the first sample Perform dimensional transformation to obtain the target bone feature matrix of the second sample.
  • Step 3.5 Perform splicing processing on the middle skeleton feature matrix of the first sample and the target contour feature matrix of the first sample to obtain the second sample splicing feature matrix, and use the second basic transformation neural network to dimension the second sample splicing feature matrix Through transformation, the target contour feature matrix of the second sample is obtained.
  • Step 3.6 Determine the third predicted position information of the skeleton key point based on the second sample target skeleton feature matrix; and determine the third predicted position information of the contour key point based on the second sample target contour feature matrix.
  • Step 3.7 Determine the fifth loss based on the actual position information of the bone key points and the third predicted position information of the bone key points; and determine the fifth loss based on the actual position information of the contour key points and the third predicted position information of the contour key points Sixth loss.
  • Step 3.8 Perform this round of training on the first basic convolutional neural network, the second basic convolutional neural network, the first basic transformation neural network, and the second basic transformation neural network based on the fifth loss and the sixth loss;
  • a feature fusion neural network is obtained.
  • the fifth loss is LS2 in FIG. 5; the sixth loss is LC2 in FIG. 5.
  • the specific structure of another feature fusion neural network includes: a first directional convolutional neural network, a second directional convolutional neural network, a third convolutional neural network, and a second directional convolutional neural network.
  • a first directional convolutional neural network includes: a first directional convolutional neural network, a second directional convolutional neural network, a third convolutional neural network, and a second directional convolutional neural network.
  • an embodiment of the present disclosure also provides a feature fusion neural network based on the feature fusion neural network provided in FIG. 7 to perform feature fusion on the first target skeleton feature matrix and the first target contour feature matrix to obtain the second target skeleton feature matrix
  • the specific method of the second target contour feature matrix includes the following steps.
  • S801 Use the first directional convolutional neural network to perform directional convolution processing on the first target bone feature matrix to obtain the first directional bone feature matrix.
  • S802 Use the second directional convolutional neural network to perform directional convolution processing on the first target contour feature matrix to obtain the first directional contour feature matrix; and use the fourth convolutional neural network to convolve the first directional contour feature matrix Product processing to obtain the second middle contour feature matrix. Go to S803.
  • S803 Perform splicing processing on the second intermediate contour feature matrix and the first target bone feature matrix to obtain a third spliced feature matrix; and use the third transformation neural network to perform dimensional transformation on the third spliced feature matrix to obtain the second target bone feature matrix.
  • S804 Perform splicing processing on the second intermediate skeleton feature matrix and the first target contour feature matrix to obtain a fourth spliced feature matrix, and use the fourth transform neural network to perform dimensional transformation on the fourth spliced feature matrix to obtain the second target contour feature matrix.
  • the bone key points are usually concentrated on the skeleton of the human body, while the contour key points are concentrated on the outline of the human body, that is, distributed around the skeleton. Therefore, it is necessary to perform local spatial transformations for bone features and contour features respectively. For example, transform the bone feature to the position of the contour feature in the contour feature matrix, and transform the contour feature to the position of the bone feature in the bone feature matrix, so as to better extract the bone features and contour features, and realize the bone features and contours.
  • the fusion of features are usually concentrated on the skeleton of the human body, while the contour key points are concentrated on the outline of the human body, that is, distributed around the skeleton. Therefore, it is necessary to perform local spatial transformations for bone features and contour features respectively. For example, transform the bone feature to the position of the contour feature in the contour feature matrix, and transform the contour feature to the position of the bone feature in the bone feature matrix, so as to better extract the bone features and contour features, and realize the bone features and contours.
  • the fusion of features are usually concentrated
  • the embodiments of the present disclosure first use the first directional convolutional neural network to perform directional convolution processing on the first target bone feature matrix; the directional convolution can effectively realize the directional spatial transformation of the bone features at the feature level. Then, a third convolutional neural network is used to perform convolution processing on the obtained first directional skeleton feature matrix to obtain a second intermediate skeleton feature matrix. In this case, since the skeletal feature has been oriented spatially transformed through the first directional convolution layer, the skeletal feature actually moves in the direction of the contour feature. Then, the second middle skeleton feature matrix and the first target contour feature matrix are spliced to obtain a fourth spliced feature matrix.
  • the fourth splicing feature matrix includes not only contour features, but also bone features that have undergone directional spatial transformation. Then, the fourth transformation neural network is used to perform dimensional transformation on the fourth splicing feature matrix, that is, from the fourth splicing feature matrix, the contour feature is extracted again.
  • the second target contour feature matrix obtained in this way will be affected by the bone features, and the fusion between the bone features and the contour features is realized.
  • the embodiment of the present disclosure first uses the second directional convolutional neural network to perform directional convolution processing on the first target contour feature matrix, and the directional convolution can effectively realize the directional spatial transformation of the contour feature at the feature level. Then, a fourth convolutional neural network is used to perform convolution processing on the obtained first directional contour feature matrix to obtain a second intermediate contour feature matrix. In this case, since the contour feature has been oriented spatially transformed through the second directional convolution layer, the contour feature actually moves in the direction of the bone feature. Then, the second intermediate contour feature matrix and the first target skeleton feature matrix are spliced to obtain a third spliced feature matrix.
  • the third splicing feature matrix includes not only the bone features, but also the contour features that have undergone directional spatial transformation. Then, the third transformation neural network is used to transform the third splicing feature matrix, that is, from the third splicing feature matrix, the bone features are extracted again.
  • the second target bone feature matrix obtained in this way will be affected by the contour feature, and the fusion between the bone feature and the contour feature is realized.
  • directional convolution consists of multiple iterative convolution steps, and effective directional convolution meets the following requirements:
  • a feature function sequence can be defined Used to control the update sequence of elements.
  • the input of the function F k is the position of each element in the first target skeleton feature matrix
  • the output of the function F k indicates whether to update the element in the kth iteration.
  • the output can be 1 or 0; 1 means update, 0 means no update.
  • 1 means update
  • 0 means no update.
  • the update of the i-th iteration can be expressed as:
  • T i (X) F i ⁇ (W ⁇ T i-1 (X)+b)+(1-F i ) ⁇ T i-1 (X).
  • T 0 (X) X
  • X represents the input of directional convolution, that is, the first target bone feature matrix
  • W and b respectively represent the shared weight and deviation during multiple iterations.
  • a pair of symmetrical directional convolution operators can be set, that is, the sequence of the above-mentioned feature functions They are the scattering convolution operator F i S and the gathering convolution operator F i G.
  • the scattering convolution operator is responsible for sequentially updating the elements in the feature matrix from the inside to the outside; while the gathering convolution operator sequentially updates the elements in the feature matrix from the outside to the inside.
  • the scattering convolution operator F i S is used ;
  • the second directional convolutional neural network is used to perform directional convolution processing on the contour feature matrix of the first target, because the contour feature element is oriented spatially transformed to the contour feature matrix The middle position (the position more related to the bone features), so the aggregation convolution operator F i G is used .
  • the first directional convolutional neural network performs directional convolution processing on the first target bone feature matrix as follows.
  • each sub-matrix is called a grid; among them, if the first target bone feature matrix is a three-dimensional matrix, the dimensions of the three dimensions are: m, n, s , The dimension of the first target skeleton feature matrix is expressed as m*n*s; if the size of the grid is 5, that is, the dimension of each grid can be expressed as 5*5*s.
  • FIG. 9a a process of using the scattering convolution operator F i S to update the element values of the elements in the sub-matrix with a grid size of 5 is provided twice.
  • a in FIG. 9a represents the original sub-matrix;
  • b represents the sub-matrix obtained by performing one iteration, and
  • c represents the sub-matrix obtained by performing two iterations, that is, the target sub-matrix.
  • the target sub-matrices corresponding to each grid are spliced together to obtain the first oriented bone feature matrix.
  • the second directional convolutional neural network performs directional convolution processing on the first target contour feature matrix as follows.
  • each sub-matrix is called a grid; among them, if the first target contour feature matrix is a three-dimensional matrix, the dimensions of the three dimensions are: m, n, s , The dimension of the first target contour feature matrix is expressed as m*n*s; if the size of the grid is 5, that is, the dimension of each grid can be expressed as 5*5*s.
  • a in FIG. 9b represents the original sub-matrix; b represents the sub-matrix obtained by performing one iteration, and c represents the sub-matrix obtained by performing two iterations, that is, the target sub-matrix.
  • the target sub-matrices corresponding to each grid are spliced together to obtain the first directional contour feature matrix.
  • FIGS. 9a and 9b are only examples of using the scattering convolution operator F i S and the gathering convolution operator F i G to iteratively update the element values of the elements in the sub-matrix.
  • the feature fusion neural network can be individually trained in the following manner.
  • Step 4.1 Obtain the first sample target bone feature matrix and the first sample target contour feature matrix of the multiple sample images.
  • the method of obtaining is similar to the method of obtaining the first target skeleton feature matrix and the first target contour feature matrix in the foregoing embodiment, and will not be repeated here. It can be acquired in the case of joint training with the first feature extraction network, or can be acquired by using a pre-trained first feature extraction network.
  • Step 4.2 Use the first basic directional convolutional neural network to perform directional convolution processing on the first sample target bone feature matrix to obtain the first sample oriented bone feature matrix; use the first sample oriented bone feature matrix, and the contour key The actual position information of the point is obtained, and the seventh loss is obtained. And based on the seventh loss, the first basic directional convolutional neural network is trained in this round.
  • the seventh loss is LC3 in FIG. 7.
  • the first basic directional convolutional neural network is used to perform directional convolution processing on the first sample target bone feature matrix, that is, the first sample target bone feature matrix is subjected to directional spatial transformation.
  • the first sample target bone feature matrix is subjected to directional spatial transformation.
  • Step 4.3 Use the second basic directional convolutional neural network to perform directional convolution processing on the first sample target contour feature matrix to obtain the first sample oriented contour feature matrix; use the first sample oriented contour feature matrix, and the bone key The actual position information of the point is obtained, and the eighth loss is obtained. And based on the eighth loss, the second basic directional convolutional neural network is trained in this round.
  • the eighth loss is LS3 in FIG. 7.
  • Step 4.4 Use the fourth basic convolutional neural network to perform convolution processing on the oriented contour feature matrix of the first sample to obtain the middle contour feature matrix of the second sample; and combine the obtained middle contour feature matrix of the second sample with the first sample
  • the target bone feature matrix is spliced to obtain the third sample spliced feature matrix
  • the third basic transformation neural network is used to perform dimensional transformation on the third sample spliced feature matrix to obtain the second sample target bone feature matrix.
  • Step 4.5 Determine the fourth predicted position information of the bone key point based on the second sample target bone feature matrix; determine the ninth loss based on the actual position information of the bone key point and the fourth predicted position information of the bone key point.
  • the ninth loss is LS4 in FIG. 7.
  • Step 4.6 Use the third basic convolutional neural network to perform convolution processing on the oriented skeleton feature matrix of the first sample to obtain the middle skeleton feature matrix of the second sample; and combine the obtained middle skeleton feature matrix of the second sample with the first sample
  • the target contour feature matrix is spliced to obtain the fourth sample spliced feature matrix
  • the fourth basic transformation neural network is used to perform dimensional transformation on the fourth sample spliced feature matrix to obtain the second sample target contour feature matrix.
  • Step 4.7 Determine the fourth predicted position information of the contour key point based on the second sample target contour feature matrix; determine the tenth loss based on the actual position information of the contour key point and the fourth predicted position information of the contour key point.
  • the tenth loss is LC4 in FIG. 7.
  • Step 4.8 Based on the ninth loss and the tenth loss, perform this round of training on the third basic convolutional neural network, the fourth basic convolutional neural network, the third basic transform neural network, and the fourth basic transform neural network.
  • the second basic directional convolutional neural network After the first basic directional convolutional neural network, the second basic directional convolutional neural network, the third basic convolutional neural network, the fourth basic convolutional neural network, the third basic transform neural network, and the fourth basic transform neural network Perform multiple rounds of training to obtain a trained feature fusion neural network.
  • another specific structure of a feature fusion neural network includes: a displacement estimation neural network and a fifth transformation neural network.
  • an embodiment of the present disclosure also provides a feature fusion neural network based on the feature fusion neural network provided in FIG. 10 to perform feature fusion on a first target skeleton feature matrix and a first target contour feature matrix to obtain a second target skeleton feature matrix
  • the specific method of the second target contour feature matrix includes the following steps.
  • S1101 Perform splicing processing on the first target skeleton feature matrix and the first target contour feature matrix to obtain a fifth spliced feature matrix.
  • S1102 Input the fifth splicing feature matrix into the displacement estimation neural network, and perform displacement estimation on a plurality of predetermined key point pairs, and obtain displacement information of one key point in each key point pair to another key point; Wherein, two key points in each key point pair are adjacent to each other, and the two key points include a bone key point and a contour key point, or include two bone key points, or include two contour key points.
  • multiple bone key points and multiple contour key points are determined in advance for the human body.
  • FIG. 12 an example of multiple bone key points and contour key points determined in advance for the human body is provided.
  • each other key points of the bones corresponds to two key points of the outline.
  • the bone key points of the double span correspond to the same contour key points.
  • Every two key points directly connected by a line segment can form a key point pair. That is, the composition of the key point pair may have the following three situations: (skeleton key point, bone key point), (contour key point, contour key point), (skeleton key point, contour key point).
  • the displacement estimation neural network includes a multi-layer convolutional layer, which is connected in turn, and is used to perform feature learning on the bone features and contour features in the fifth splicing feature matrix to obtain a key point movement in each key point pair Displacement information to another key point. There are two sets of displacement information corresponding to each key point.
  • the displacement information of the key point pair includes: the displacement information from P to Q, and the displacement information from Q to P.
  • Each set of displacement information includes moving direction and moving distance.
  • S1103 Use each key point in each key point pair as the current key point, and obtain the second key point corresponding to the other key point from the three-dimensional feature matrix corresponding to the other key point paired with the current key point.
  • One-dimensional feature matrix if the other key point of the pair is a bone key point, the three-dimensional feature matrix corresponding to the bone key point is the first bone feature matrix; if the other key point of the pair is a contour key point, the contour The three-dimensional feature matrix corresponding to the key point is the first contour feature matrix.
  • S1104 According to the displacement information from another key point of the pair to the current key point, perform position transformation on an element in the two-dimensional feature matrix corresponding to the other key point of the pair to obtain a displacement feature matrix corresponding to the current key point.
  • P is used as the current key point
  • the two-dimensional feature matrix corresponding to Q is obtained from the three-dimensional feature matrix corresponding to Q.
  • the three-dimensional feature matrix corresponding to Q is the first bone feature matrix (see S402 above). If Q is the contour key point, the three-dimensional feature matrix corresponding to Q is the first contour feature matrix (see S403 above).
  • the first skeleton feature matrix is taken as the three-dimensional feature matrix of Q, and the two-dimensional feature matrix of Q is obtained from the first skeleton feature matrix.
  • the first bone feature matrix only includes bone features, which can make the bone features learned in the subsequent processing more targeted.
  • the first contour feature matrix is taken as the three-dimensional feature matrix of Q, and the two-dimensional feature matrix of Q is obtained from the first contour feature matrix. This is because only the contour features are included in the first contour feature matrix, which makes the contour features learned in the subsequent processing more pertinent.
  • the positions of the elements in the two-dimensional feature matrix of Q are transformed to obtain the displacement feature matrix corresponding to P.
  • the displacement information from Q to P is (2, 3), where 2 means the distance moved in the first dimension is 2, and 3 means the distance moved in the second dimension is 3, then
  • the two-dimensional feature matrix of Q is shown in Figure 13 a; after performing position transformation on the elements in the two-dimensional feature matrix of Q, the displacement feature matrix corresponding to P is obtained as shown in Figure 13 b.
  • the displacement information should be understood in combination with specific solutions.
  • the displacement information "2" can refer to 2 elements, 2 cells, and so on.
  • the displacement characteristic matrix corresponding to each bone key point and the displacement characteristic matrix corresponding to each contour key point can be generated.
  • each bone key point may be paired with multiple key points. Therefore, there may be multiple displacement feature matrices for each bone key point; each contour key point is also It may be paired with multiple key points, so there may be multiple displacement feature matrices for each contour key point. And for different contour key points, the number of corresponding displacement feature matrices may be different; for different bone key points, the number of corresponding displacement feature matrices may also be different.
  • S1105 For each bone key point, perform splicing processing on the two-dimensional feature matrix corresponding to the bone key point and each displacement feature matrix corresponding to the bone key point to obtain the spliced two-dimensional feature matrix of the bone key point; and The spliced two-dimensional feature matrix of the bone key points is input to the fifth transform neural network to obtain the target two-dimensional feature matrix corresponding to the bone key point; based on the target two-dimensional feature matrix corresponding to each bone key point, the second target is generated Bone feature matrix.
  • S1106 For each contour key point, perform splicing processing on the two-dimensional feature matrix corresponding to the contour key point and each displacement feature matrix corresponding to the contour key point to obtain the spliced two-dimensional feature matrix of the contour key point; and The spliced two-dimensional feature matrix of the contour key points is input to the fifth transform neural network to obtain the target two-dimensional feature matrix corresponding to the contour key point; based on the target two-dimensional feature matrix corresponding to each contour key point, the second target is generated Contour feature matrix.
  • the three displacement feature matrices of P can be obtained, which are P1', P2', and P3', then P', P1', P2', and P3' are spliced to obtain a spliced two-dimensional feature matrix of P.
  • the three displacement feature matrices of P may be obtained by transforming the positions of the elements in the two-dimensional feature matrix corresponding to the key points of the skeleton, as well as the elements in the two-dimensional feature matrix corresponding to the key points of the contour. It is obtained by changing the position.
  • P', P1', P2', and P3' are spliced, so that the features of each key point adjacent to P in position are merged together. Then use the fifth transform neural network to perform convolution processing on the spliced two-dimensional feature matrix of P, so that the obtained target two-dimensional feature matrix of P contains both bone features and contour features, which realizes the combination of bone features and contour features. Fusion.
  • the feature fusion neural network can be individually trained in the following manner.
  • Step 5.1 Obtain the first sample target bone feature matrix and the first sample target contour feature matrix of multiple sample images.
  • the method of obtaining is similar to the method of obtaining the first target skeleton feature matrix and the first target contour feature matrix in the foregoing embodiment, and will not be repeated here. It can be acquired in the case of joint training with the first feature extraction network, or can be acquired by using a pre-trained first feature extraction network.
  • Step 5.2 Perform splicing processing on the first sample target skeleton feature matrix and the first sample target contour feature matrix to obtain the fifth sample splicing feature matrix.
  • Step 5.3 Input the fifth sample splicing feature matrix into the basic displacement estimation neural network, perform displacement estimation on multiple sets of key point pairs determined in advance, and obtain the movement of one key point in each key point pair to another key point Predict displacement information; where two key points in each key point pair are adjacent, the two key points include a bone key point and a contour key point, or two bone key points, or two contours key point.
  • Step 5.4 Take each key point in each key point pair as the current key point, and obtain the corresponding to the other key point of the pair from the sample three-dimensional feature matrix corresponding to the other key point paired with the current key point The two-dimensional feature matrix of the sample.
  • Step 5.5 According to the predicted displacement information from the other key point of the pair to the current key point, perform position transformation on the element in the two-dimensional feature matrix of the sample corresponding to the other key point of the pair to obtain the sample corresponding to the current key point Displacement feature matrix.
  • Step 5.6 Determine the displacement loss according to the sample displacement characteristic matrix corresponding to the current key point and the sample two-dimensional characteristic matrix corresponding to the current key point.
  • Step 5.7 Based on the displacement loss, perform this round of training on the displacement estimation neural network.
  • Step 5.8 For each bone key point, the sample two-dimensional feature matrix corresponding to the bone key point, and each sample displacement feature matrix corresponding to the bone key point are spliced to obtain the two-dimensional feature of the sample splicing of the bone key point Matrix; and input the two-dimensional feature matrix of the sample splicing of the bone key points into the fifth basic transformation neural network to obtain the two-dimensional feature matrix of the sample target corresponding to the bone key point; based on the sample target two corresponding to each bone key point The dimensional feature matrix is used to generate the target bone feature matrix of the second sample.
  • Step 5.9 For each contour key point, the two-dimensional feature matrix of the sample corresponding to the contour key point and the displacement feature matrix of each sample corresponding to the contour key point are spliced to obtain the two-dimensional splicing feature of the sample of the contour key point Matrix; and input the two-dimensional feature matrix of the sample splicing of the contour key points into the fifth basic transformation neural network to obtain the two-dimensional feature matrix of the sample target corresponding to the contour key point; based on the sample target corresponding to each contour key point The dimensional feature matrix is used to generate the contour feature matrix of the second sample target.
  • Step 5.10 Determine the transformation loss based on the second sample target skeleton feature matrix, the second sample target contour feature matrix, the actual position information of the skeleton key points, and the actual position information of the contour key points.
  • the predicted position information of the skeleton key points may be determined based on the second sample target skeleton feature matrix
  • the predicted position information of the contour key points may be determined based on the second sample target contour feature matrix.
  • the transformation loss is determined based on the predicted position information and actual position information of the bone key points, and the predicted position information and actual position information of the contour key points.
  • Step 5.11 Based on the transformation loss, perform this round of training on the fifth basic transformation neural network.
  • Step 5.12 After multiple rounds of training on the basic displacement estimation neural network and the fifth basic transformation neural network, a feature fusion neural network is obtained.
  • the i+1th feature extraction is performed based on the feature fusion result of the i-th feature fusion, and i is a positive integer.
  • the process of performing the first feature extraction is consistent with the process of extracting bone features and contour features from the image to be detected in A, and will not be repeated here.
  • each feature extraction except the first feature extraction in B includes:
  • the network parameters of the first feature extraction network and the second feature extraction network are different, and the network parameters of the second feature extraction network used for different times of feature extraction are different.
  • both the first feature extraction network and the second feature extraction network include multiple convolutional layers.
  • the network parameters of the first feature extraction network and the second feature extraction network include, but are not limited to: the number of convolutional layers, the size of the convolution kernel used by each convolutional layer, and the convolution used by each convolutional layer. The number of cores, etc.
  • an embodiment of the present disclosure provides a schematic structural diagram of a second feature extraction network.
  • the second feature extraction network includes: a second bone feature extraction network and a second contour feature extraction network.
  • the feature fusion result of the last feature fusion of this feature extraction using the second feature extraction network includes: the second target skeleton feature matrix and the second target contour feature matrix; specifically, the second target skeleton feature matrix and the second target contour are obtained.
  • the process of the feature matrix is shown in A above, and will not be repeated here.
  • the specific process of the first target contour feature matrix is, for example:
  • the second bone feature extraction network uses the second bone feature extraction network to perform convolution processing on the second target bone feature matrix obtained from the last feature fusion to obtain the third bone feature matrix, and obtain the first bone feature matrix from the third target convolution layer in the second bone feature extraction network Four bone feature matrix; based on the third bone feature matrix and the fourth bone feature matrix, the fifth target bone feature matrix is obtained.
  • the third target convolutional layer is any convolutional layer except the last convolutional layer in the second bone feature extraction network.
  • the second contour feature extraction network uses the second contour feature extraction network to perform convolution processing on the second target contour feature matrix obtained from the last feature fusion to obtain the third contour feature matrix, and obtain the first contour feature matrix from the fourth target convolution layer in the second contour feature extraction network Four contour feature matrices; based on the third contour feature matrix and the fourth contour feature matrix, the sixth target contour feature matrix is obtained.
  • the fourth target convolutional layer is any convolutional layer except the last convolutional layer in the second contour feature extraction network.
  • the specific processing method is similar to the specific process of using the first skeleton feature extraction network to extract the first target skeleton feature matrix and the first target contour feature matrix from the image to be detected in A, and will not be repeated here.
  • the above embodiments describe the method of determining the position information of the bone key points and the contour key points in the above II.
  • the position of each bone key point and the position of the contour key points can be determined from the image to be detected.
  • the human body detection result can then be generated.
  • the human body detection result includes one or more of the following: a to-be-detected image including bone key point markers and contour key point markers; a data group including position information of the bone key points and position information of the contour key points.
  • one or more of the following operations may be performed: human body motion recognition, human body posture detection, human body contour adjustment, human body image editing, and human body mapping.
  • action recognition for example, recognizes the current actions of the human body, such as fighting, running, etc.
  • human body posture recognition for example, recognizes the current posture of the human body, such as lying down, whether to make a specified action, etc.
  • human contour adjustment such as adjusting the body shape and height
  • Human body image editing such as zooming, rotating, cropping, etc.
  • human body mapping for example, after detecting the human body in image A, paste the corresponding human body image into image B.
  • the embodiments of the present disclosure can determine the position information of the key points of the skeleton used to characterize the human bone structure and the position information of the key points of the outline used to characterize the outline of the human body from the image to be detected, and are based on the position information of the key points of the bone, and The position information of the key points of the contour generates the human body detection result, which improves the precision of the representation while taking into account the amount of calculation data.
  • the information characterizing the human body is richer and has a broader Application scenarios, such as image editing, body shape adjustment, etc.
  • the embodiment of the present disclosure also provides a human body detection device corresponding to the human body detection method. Since the principle of the device in the embodiment of the present disclosure to solve the problem is similar to the above-mentioned human body detection method of the embodiment of the present disclosure, the implementation of the device You can refer to the implementation of the method, and the repetition will not be repeated here.
  • the device includes: an acquisition module 151, a detection module 152, and a generation module 153; wherein, the acquisition module 151 is used to acquire an image to be detected
  • the detection module 152 based on the image to be detected, is used to determine the position information of the bone key points used to characterize the human skeletal structure and the position information of the contour key points used to characterize the contour of the human body;
  • the generation module 153 is used to determine the position information based on the The position information of the bone key points and the position information of the contour key points generate a human body detection result.
  • the contour key points include a main contour key point and an auxiliary contour key point; wherein there is at least one auxiliary contour key point between two adjacent main contour key points.
  • the detection module 152 is configured to determine the position information of key contour points used to characterize the contour of the human body based on the image to be detected in the following manner: determine the position information of the contour key points used to characterize the contour of the human body based on the image to be detected The position information of the key points of the main contour; determine the human body contour information based on the position information of the key points of the main contour; determine the position information results of multiple key points of the auxiliary contour based on the determined human contour information.
  • the human body detection result includes one or more of the following: an image to be detected added with bone key point markers and contour key point markers; including position information of the bone key points and all A data group describing the location information of key points of the outline.
  • the human body detection device further includes: an execution module 154, configured to perform one or more of the following operations based on the human body detection result: human body motion recognition, human body posture detection, and human body contour adjustment , Human body image editing, and body stickers.
  • the detection module 152 is configured to determine, based on the image to be detected, the position information of key bone points used to characterize the skeleton structure of the human body and the contour used to characterize the outline of the human body in the following manner Location information of key points: based on the image to be detected, perform feature extraction to obtain bone features and contour features, and perform feature fusion on the obtained bone features and contour features; determine the location of the bone key points based on the result of feature fusion Information and the location information of the key points of the outline.
  • the detection module 152 is configured to perform feature extraction based on the image to be detected in the following manner to obtain bone features and contour features, and perform feature fusion on the obtained bone features and contour features: Based on the image to be detected, perform feature extraction at least once, and perform feature fusion on the bone features and contour features obtained from each feature extraction.
  • the feature is based on the i-th feature fusion
  • the fusion result is subjected to the i+1th feature extraction, and i is a positive integer;
  • the detection module 152 is used to determine the position information of the bone key points used to characterize the human skeletal structure based on the feature fusion result in the following manner, and use
  • the position information of the key points of the contour for characterizing the contour of the human body based on the feature fusion result of the last feature fusion, the position information of the key bone points and the position information of the key points of the contour are determined.
  • the detection module 152 is configured to perform at least one feature extraction based on the image to be detected in the following manner: in the first feature extraction, a pre-trained first feature extraction network is used Extract the first target skeleton feature matrix of the skeleton key points used to characterize the human skeleton feature and the first target contour feature matrix of the contour key points used to characterize the contour feature of the human body from the image to be detected; in the i+1th feature In the extraction, the pre-trained second feature extraction network is used to extract the first target bone feature matrix used to characterize the skeleton key points of the human skeleton from the feature fusion result of the i-th feature fusion; and extract the first target bone feature matrix used to characterize the human body contour The first target contour feature matrix of the contour key points of the feature; wherein the network parameters of the first feature extraction network and the second feature extraction network are different, and the network parameters of the second feature extraction network used for different times of feature extraction are different.
  • the detection module 152 is configured to perform feature fusion of the extracted bone features and contour features in the following manner: use a pre-trained feature fusion neural network to perform feature fusion on the first target bone feature matrix And performing feature fusion on the first target contour feature matrix to obtain a second target skeleton feature matrix and a second target contour feature matrix; wherein, the second target skeleton feature matrix is a three-dimensional skeleton feature matrix, and the three-dimensional skeleton feature matrix It includes a two-dimensional bone feature matrix corresponding to each bone key point; the value of each element in the two-dimensional bone feature matrix represents the probability that the pixel corresponding to the element belongs to the corresponding bone key point; the second target The contour feature matrix is a three-dimensional contour feature matrix, the three-dimensional contour feature matrix includes a two-dimensional contour feature matrix corresponding to each key point of the contour respectively; the value of each element in the two-dimensional contour feature matrix represents the pixel corresponding to the element The probability that a point belongs to the key point of the corresponding contour
  • the detection module 152 is configured to determine the position information of the bone key points and the position information of the contour key points based on the feature fusion result of the last feature fusion in the following manner: Determine the position information of the skeleton key points based on the second target skeleton feature matrix obtained in the last feature fusion; and determine the position information of the contour key points based on the second target contour feature matrix obtained in the last feature fusion.
  • the first feature extraction network includes: a common feature extraction network, a first bone feature extraction network, and a first contour feature extraction network;
  • the detection module 152 is configured to use the first feature in the following manner
  • the extraction network extracts the first target skeleton feature matrix of the bone key points used to characterize the human skeleton feature from the image to be detected; and extracts the first target contour feature matrix of the contour key points used to characterize the human contour feature:
  • the feature extraction network performs convolution processing on the basic feature matrix to obtain a first contour feature matrix, and obtains a second contour feature matrix from the second target convolution layer in the first contour feature extraction network; based on the The first contour feature matrix and the second contour feature matrix are used to obtain the first target contour feature matrix; the second target convolutional layer is the first contour feature extraction network, except for the last convolutional layer Any other convolutional layer
  • the detection module 152 is configured to obtain the first target bone feature matrix based on the first bone feature matrix and the second bone feature matrix in the following manner:
  • the first bone feature matrix and the second bone feature matrix are spliced to obtain a first spliced bone feature matrix;
  • the first spliced bone feature matrix is subjected to dimensional transformation processing to obtain the first target bone feature matrix;
  • the obtaining the first target contour characteristic matrix based on the first contour characteristic matrix and the second contour characteristic matrix includes: performing splicing processing on the first contour characteristic matrix and the second contour characteristic matrix, Obtain a first spliced contour feature matrix; perform dimensional transformation processing on the first spliced contour feature matrix to obtain the first target contour feature matrix; wherein the dimensions of the first target skeleton feature matrix are the same as those of the first target The dimensions of the contour feature matrix are the same, and the dimensions of the first target skeleton feature matrix and the first target contour feature matrix are the same in the same dimension.
  • the feature fusion neural network includes: a first convolutional neural network, a second convolutional neural network, a first transformation neural network, and a second transformation neural network;
  • the detection module 152 is configured to perform feature fusion on the first target skeleton feature matrix and the first target contour feature matrix using a feature fusion neural network in the following manner to obtain a second target skeleton feature matrix and a second target skeleton feature matrix.
  • Target contour feature matrix use the first convolutional neural network to perform convolution processing on the first target bone feature matrix to obtain a first intermediate bone feature matrix; and use the second convolutional neural network to perform convolution processing on the first bone feature matrix;
  • a target contour feature matrix is subjected to convolution processing to obtain a first intermediate contour feature matrix; the first intermediate contour feature matrix and the first target bone feature matrix are spliced to obtain the first spliced feature matrix; and
  • the first transformation neural network performs dimensional transformation on the first splicing feature matrix to obtain the second target bone feature matrix; splicing the first intermediate bone feature matrix and the first target contour feature matrix, Obtain a second splicing feature matrix, and use the second transformation neural network to perform dimensional transformation on the second splic
  • the feature fusion neural network includes: a first directional convolutional neural network, a second directional convolutional neural network, a third convolutional neural network, a fourth convolutional neural network, and a third transform Neural network, and the fourth transform neural network;
  • the detection module 152 is configured to perform feature fusion on the first target skeleton feature matrix and the first target contour feature matrix using a feature fusion neural network in the following manner to obtain a second target skeleton feature matrix and a second target skeleton feature matrix.
  • Target contour feature matrix use the first directional convolutional neural network to perform directional convolution processing on the first target bone feature matrix to obtain the first directional bone feature matrix; and use the third convolutional neural network to Performing convolution processing on the first directional skeleton feature matrix to obtain a second intermediate skeletal feature matrix; and using the second directional convolution neural network to perform directional convolution processing on the first target contour feature matrix to obtain the first Oriented contour feature matrix; and use a fourth convolutional neural network to perform convolution processing on the first oriented contour feature matrix to obtain a second intermediate contour feature matrix; combine the second intermediate contour feature matrix with the first A target skeleton feature matrix is spliced to obtain a third spliced feature matrix; and a third transformation neural network is used to perform
  • the feature fusion neural network includes: a displacement estimation neural network and a fifth transformation neural network;
  • the detection module 152 is configured to perform feature fusion on the first target skeleton feature matrix and the first target contour feature matrix using a feature fusion neural network in the following manner to obtain a second target skeleton feature matrix and a second target skeleton feature matrix.
  • Target contour feature matrix performing splicing processing on the first target skeleton feature matrix and the first target contour feature matrix to obtain a fifth splicing feature matrix; the fifth splicing feature matrix is input to the displacement estimation neural network , Perform displacement estimation on multiple sets of key point pairs determined in advance, and obtain the displacement information of one key point in each key point pair to another key point; regard each key point in each key point pair as the current The key point, from the three-dimensional feature matrix corresponding to the other key point paired with the current key point, obtain the two-dimensional feature matrix corresponding to the other key point of the pair; For the displacement information of the current key point, perform position transformation on an element in the two-dimensional feature matrix corresponding to the other key point of the pair to obtain the displacement feature matrix corresponding to the current key
  • the human body detection method is implemented by a human body detection model; the human body detection model includes: the first feature extraction network and/or the feature fusion neural network; the human body detection model is used
  • the sample images in the training sample set are obtained through training, and the sample images are annotated with actual position information of the bone key points of the human skeletal structure and actual position information of the contour key points of the contour of the human body.
  • the embodiment of the present disclosure also provides a computer device. As shown in FIG. 16, it is a schematic structural diagram of the computer device provided by the embodiment of the present disclosure, including:
  • the processor 11 and the storage medium 12 pass through The bus 13 communicates so that the processor 11 is executing the following instructions: acquiring the image to be detected; based on the image to be detected, determining the position information of key bone points used to characterize the human skeletal structure and the contour used to characterize the outline of the human body The position information of the key point; based on the position information of the bone key point and the position information of the contour key point, a human body detection result is generated.
  • the embodiment of the present disclosure also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program executes the steps of the human body detection method described in the above method embodiment when the computer program is run by a processor .
  • the computer program product of the human body detection method provided by the embodiment of the present disclosure includes a computer-readable storage medium storing program code, and the program code includes instructions that can be used to execute the steps of the human body detection method described in the above method embodiment
  • the program code includes instructions that can be used to execute the steps of the human body detection method described in the above method embodiment
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile computer readable storage medium executable by a processor.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

一种人体检测方法、装置、计算机设备及存储介质,其中,该方法包括:获取待检测图像(S101);基于所述待检测图像,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息(S102);基于所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息,生成人体检测结果(S103)。

Description

人体检测方法、装置、计算机设备及存储介质 技术领域
本公开涉及图像处理技术领域,具体而言,涉及一种人体检测方法、装置、计算机设备及存储介质。
背景技术
随着神经网络在图像、视频、语音、文本等领域的应用,用户对基于神经网络的各种模型的精度要求也越来越高。在图像中进行人体检测是神经网络的一种重要应用场景,对人体检测的精细度和计算数据量的要求较高。
发明内容
本公开实施例的目的在于提供一种人体检测方法、装置、计算机设备及存储介质。
第一方面,本公开实施例提供了一种人体检测方法,包括:获取待检测图像;基于所述待检测图像,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息;基于所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息,生成人体检测结果。
本公开实施例能够从待检测图像中,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息,并基于骨骼关键点的位置信息、以及轮廓关键点的位置信息,生成人体检测结果,在提升表征精细度的同时,兼顾计算数据量。
另外,本公开实施方式中,由于是采用表征人体骨骼结构的骨骼关键点的位置信息,和表征人体轮廓的轮廓关键点的位置信息来得到人体检测结果,表征人体的信息更加丰富,具有更广阔的应用场景,如图像编辑、人体体型调整等。
一种可选实施方式中,所述轮廓关键点包括主轮廓关键点和辅助轮廓关键点;其中,两个相邻的所述主轮廓关键点之间存在至少一个辅助轮廓关键点。
在该实施方式中,通过主轮廓关键点的位置信息和辅助轮廓关键点的位置信息表征人体轮廓,使得人体轮廓的标识更加精确,信息量更加丰富。
一种可选实施方式中,基于所述待检测图像,确定用于表征人体轮廓的轮廓关键点的位置信息,包括:基于所述待检测图像,确定所述主轮廓关键点的位置信息;基于所述主轮廓关键点的位置信息,确定人体轮廓信息;基于确定的所述人体轮廓信息,确定多个所述辅助轮廓关键点的位置信息。
在该实施方式中,能够更加精确的定位主轮廓关键点的位置信息、以及辅助轮廓关键点的位置信息。
一种可选实施方式中,所述人体检测结果包括下述一种或者多种:添加有骨骼关键点标记、以及轮廓关键点标记的待检测图像;包括所述骨骼关键点的位置信息以及所述 轮廓关键点的位置信息的数据组。
在该实施方式中,包括了骨骼关键点标记、以及轮廓关键点标记的待检测图像能够给人以更直观的视觉印象;包括骨骼关键点的位置信息以及轮廓关键点的位置信息的数据组更易于后续处理。
一种可选实施方式中,该方法还包括:基于所述人体检测结果,执行下述操作中一种或者多种:人体动作识别、人体姿态检测、人体轮廓调整、人体图像编辑、以及人体贴图。
在该实施方式中,基于表征精细更高和计算数据量更少的人体检测结果,能够以更高的精度、更快的速度实现更多的操作。
一种可选实施方式中,所述基于所述待检测图像,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息,包括:基于所述待检测图像,进行特征提取以获得骨骼特征及轮廓特征,并将得到的骨骼特征和轮廓特征进行特征融合;基于特征融合结果,确定所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息。
该实施方式中,能够对待检测图像进行特征提取以获得骨骼特征和轮廓特征,并将得到的骨骼特征及轮廓特征进行特征融合,进而得到用于表征人体骨骼结构的骨骼关键点的位置信息,以及用于能够表征人体轮廓的轮廓关键点的位置信息。基于该种方法得到的人体检测结果,既能够以更少的数据量表示人体,又提取到人体的骨骼特征和轮廓特征来表示人体,兼顾提升表征精细度。
一种可选实施方式中,所述基于所述待检测图像,进行特征提取以获得骨骼特征及轮廓特征,并将得到的骨骼特征和轮廓特征进行特征融合,包括:基于所述待检测图像,进行至少一次特征提取,并将每次特征提取得到的骨骼特征以及轮廓特征进行特征融合,其中,在进行多次特征提取的情况下,基于第i次特征融合的特征融合结果进行第i+1次特征提取,i为正整数;所述基于特征融合结果,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息,包括:基于最后一次特征融合的特征融合结果,确定所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息。
在该实施方式中,对待检测图像进行至少一次特征提取,并将每次特征提取得到的骨骼特征以及轮廓特征进行特征融合,能够使得具有位置关联关系的骨骼特征点和轮廓特征点进行相互矫正,最终得到的骨骼关键点的位置信息、以及轮廓关键点的位置信息具有更高的精度。
一种可选实施方式中,所述基于所述待检测图像,进行至少一次特征提取,包括:在第一次特征提取中,使用预先训练的第一特征提取网络从待检测图像中提取用于表征人体骨骼特征的骨骼关键点的第一目标骨骼特征矩阵;并提取用于表征人体轮廓特征的轮廓关键点的第一目标轮廓特征矩阵;在第i+1次特征提取中,使用预先训练的第二特征提取网络从第i次特征融合的特征融合结果中,提取用于表征人体骨骼特征的骨骼关键点的第一目标骨骼特征矩阵;并提取用于表征人体轮廓特征的轮廓关键点的第一目标 轮廓特征矩阵;其中,第一特征提取网络和第二特征提取网络的网络参数不同,且不同次的特征提取使用的第二特征提取网络的网络参数不同。
在该实施例中,对骨骼特征和轮廓特征进行至少一次提取和至少一次的融合,最终得到的骨骼关键点的位置信息、以及轮廓关键点的位置信息具有更高的精度。
一种可选实施方式中,将提取得到的骨骼特征和轮廓特征进行特征融合,包括:使用预先训练的特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵;其中,所述第二目标骨骼特征矩阵为三维骨骼特征矩阵,该三维骨骼特征矩阵包括与各个骨骼关键点分别对应的二维骨骼特征矩阵;所述二维骨骼特征矩阵中每个元素的值,表征与该元素对应的像素点属于对应骨骼关键点的概率;所述第二目标轮廓特征矩阵为三维轮廓特征矩阵,该三维轮廓特征矩阵包括与各个轮廓关键点分别对应的二维轮廓特征矩阵;所述二维轮廓特征矩阵中每个元素的值,表征与该元素对应的像素点属于对应轮廓关键点的概率;不同次特征融合使用的特征融合神经网络的网络参数不同。
该实施方式中,基于预先训练的特征融合网络对骨骼特征以及轮廓特征进行融合,能够得到更好的特征融合结果,使最终得到的骨骼关键点的位置信息、以及轮廓关键点的位置信息具有更高的精度。
一种可选实施方式中,所述基于最后一次特征融合的特征融合结果,确定所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息,包括:基于最后一次特征融合得到的第二目标骨骼特征矩阵,确定所述骨骼关键点的位置信息;以及基于最后一次特征融合得到的第二目标轮廓特征矩阵,确定所述轮廓关键点的位置信息。
该实施方式中,经过至少一次特征提取和特征融合,使最终得到的骨骼关键点的位置信息、以及轮廓关键点的位置信息具有更高的精度。
一种可选实施方式中,所述第一特征提取网络包括:共有特征提取网络、第一骨骼特征提取网络以及第一轮廓特征提取网络;使用第一特征提取网络从待检测图像中提取用于表征人体骨骼特征的骨骼关键点的第一目标骨骼特征矩阵;并提取用于表征人体轮廓特征的轮廓关键点的第一目标轮廓特征矩阵,包括:使用所述共有特征提取网络对所述待检测图像进行卷积处理,得到包含骨骼特征以及轮廓特征的基础特征矩阵;使用所述第一骨骼特征提取网络对所述基础特征矩阵进行卷积处理,得到第一骨骼特征矩阵,并从所述第一骨骼特征提取网络中的第一目标卷积层获取第二骨骼特征矩阵;基于所述第一骨骼特征矩阵以及所述第二骨骼特征矩阵,得到所述第一目标骨骼特征矩阵;所述第一目标卷积层为所述第一骨骼特征提取网络中,除最后一层卷积层外的其他任一卷积层;使用所述第一轮廓特征提取网络,对所述基础特征矩阵进行卷积处理,得到第一轮廓特征矩阵,并从所述第一轮廓特征提取网络中的第二目标卷积层获取第二轮廓特征矩阵;基于所述第一轮廓特征矩阵以及所述第二轮廓特征矩阵,得到所述第一目标轮廓特征矩阵;所述第二目标卷积层为所述第一轮廓特征提取网络中,除最后一层卷积层外的其他任一卷积层。
该实施方式中,使用共有特征提取网络提取骨骼特征和轮廓特征,去除待检测图像中除骨骼特征和轮廓特征外的其他特征,然后分别使用第一骨骼特征提取网络对骨骼特 征进行针对性提取,使用第一轮廓特征提取网络对轮廓特征进行针对性提取,所需要耗费的计算量更少。
一种可选实施方式中,所述基于所述第一骨骼特征矩阵以及所述第二骨骼特征矩阵,得到所述第一目标骨骼特征矩阵,包括:将所述第一骨骼特征矩阵以及所述第二骨骼特征矩阵进行拼接处理,得到第一拼接骨骼特征矩阵;对所述第一拼接骨骼特征矩阵进行维度变换处理,得到所述第一目标骨骼特征矩阵;所述基于所述第一轮廓特征矩阵以及所述第二轮廓特征矩阵,得到所述第一目标轮廓特征矩阵,包括:将所述第一轮廓特征矩阵以及所述第二轮廓特征矩阵进行拼接处理,得到第一拼接轮廓特征矩阵;对所述第一拼接轮廓特征矩阵进行维度变换处理,得到所述第一目标轮廓特征矩阵;其中,所述第一目标骨骼特征矩阵的维度与所述第一目标轮廓特征矩阵的维度相同、且所述第一目标骨骼特征矩阵与所述第一目标轮廓特征矩阵在相同维度上的维数相同。
该实施方式中,将第一骨骼特征矩阵以及所述第二骨骼特征矩阵进行拼接处理,使得第一目标骨骼特征矩阵中具有更加丰富的骨骼特征信息;同时将第一轮廓特征矩阵以及所述第二轮廓特征矩阵进行拼接处理,使得第一目标轮廓特征矩阵具有更加丰富的骨骼特征信息,在后续的特征融合过程中,能够以更高的精度提取得到骨骼关键点的位置信息、以及轮廓关键点的位置信息。
一种可选实施方式中,所述特征融合神经网络包括:第一卷积神经网络、第二卷积神经网络、第一变换神经网络、以及第二变换神经网络;所述使用特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵,包括:使用所述第一卷积神经网络对所述第一目标骨骼特征矩阵进行卷积处理,得到第一中间骨骼特征矩阵;以及使用所述第二卷积神经网络对所述第一目标轮廓特征矩阵进行卷积处理,得到第一中间轮廓特征矩阵;将所述第一中间轮廓特征矩阵与所述第一目标骨骼特征矩阵进行拼接处理,得到第一拼接特征矩阵;并使用所述第一变换神经网络对所述第一拼接特征矩阵进行维度变换,得到所述第二目标骨骼特征矩阵;将所述第一中间骨骼特征矩阵与所述第一目标轮廓特征矩阵进行拼接处理,得到第二拼接特征矩阵,并使用所述第二变换神经网络对所述第二拼接特征矩阵进行维度变换,得到所述第二目标轮廓特征矩阵。
该实施方式中,通过将所述第一中间轮廓特征矩阵与所述第一目标骨骼特征矩阵进行拼接处理,并基于拼接处理结果得到第二目标骨骼特征矩阵的方式,将骨骼特征和轮廓特征进行融合,以实现使用轮廓特征提取得到的骨骼特征进行矫正。另外,通过将所述第一中间骨骼特征矩阵与所述第一目标轮廓特征矩阵进行拼接处理,并基于拼接处理结果得到第二目标轮廓特征矩阵的方式,以将骨骼特征和轮廓特征进行融合,以实现使用骨骼特征对提取得到的轮廓特征进行矫正。进而,能够以更高的精度提取得到骨骼关键点的位置信息、以及轮廓关键点的位置信息。
一种可选实施方式中,所述特征融合神经网络包括:第一定向卷积神经网络、第二定向卷积神经网络、第三卷积神经网络、第四卷积神经网络、第三变换神经网络、以及第四变换神经网络;所述使用特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特 征矩阵,包括:使用所述第一定向卷积神经网络对所述第一目标骨骼特征矩阵进行定向卷积处理,得到第一定向骨骼特征矩阵;并使用第三卷积神经网络对所述第一定向骨骼特征矩阵进行卷积处理,得到第二中间骨骼特征矩阵;以及使用所述第二定向卷积神经网络对所述第一目标轮廓特征矩阵进行定向卷积处理,得到第一定向轮廓特征矩阵;并使用第四卷积神经网络对所述第一定向轮廓特征矩阵进行卷积处理,得到第二中间轮廓特征矩阵;将所述第二中间轮廓特征矩阵与所述第一目标骨骼特征矩阵进行拼接处理,得到第三拼接特征矩阵;并使用第三变换神经网络对所述第三拼接特征矩阵进行维度变换,得到所述第二目标骨骼特征矩阵;将所述第二中间骨骼特征矩阵与所述第一目标轮廓特征矩阵进行拼接处理,得到第四拼接特征矩阵,并使用第四变换神经网络对所述第四拼接特征矩阵进行维度变换,得到所述第二目标轮廓特征矩阵。
该实施方式中,通过定向卷积的方式对特征进行融合处理,能够以更高的精度提取得到骨骼关键点的位置信息、以及轮廓关键点的位置信息。
一种可选实施方式中,所述特征融合神经网络包括:位移估计神经网络、第五变换神经网络;所述使用特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵,包括:对所述第一目标骨骼特征矩阵和所述第一目标轮廓特征矩阵进行拼接处理,得到第五拼接特征矩阵;将所述第五拼接特征矩阵输入至所述位移估计神经网络中,对预先确定的多组关键点对进行位移估计,得到每组关键点对中的一个关键点移动至另一关键点的位移信息;将每组关键点对中的每个关键点分别作为当前关键点,从与该当前关键点配对的另一关键点对应的三维特征矩阵中,获取与所述配对的另一关键点对应的二维特征矩阵;根据从所述配对的另一关键点到所述当前关键点的位移信息,对所述配对的另一关键点对应的二维特征矩阵中的元素进行位置变换,得到与该当前关键点对应的位移特征矩阵;针对每个骨骼关键点,将该骨骼关键点对应的二维特征矩阵,与其对应的各个位移特征矩阵进行拼接处理,得到该骨骼关键点的拼接二维特征矩阵;并将该骨骼关键点的拼接二维特征矩阵输入至所述第五变换神经网络,得到与该骨骼关键点对应的目标二维特征矩阵;基于各个骨骼关键点分别对应的目标二维特征矩阵,生成所述第二目标骨骼特征矩阵;针对每个轮廓关键点,将该轮廓关键点对应的二维特征矩阵,与其对应的各个位移特征矩阵进行拼接处理,得到该轮廓关键点的拼接二维特征矩阵;并将该轮廓关键点的拼接二维特征矩阵输入至所述第五变换神经网络,得到与该轮廓关键点对应的目标二维特征矩阵;基于各个轮廓关键点分别对应的目标二维特征矩阵,生成所述第二目标轮廓特征矩阵。
该实施方式中,通过对骨骼关键点,以及轮廓关键点进行位移变换的方式实现特征融合,能够以更高的精度提取得到骨骼关键点的位置信息、以及轮廓关键点的位置信息。
一种可选实施方式中,所述人体检测方法通过人体检测模型实现;所述人体检测模型包括:所述第一特征提取网络和/或所述特征融合神经网络:所述人体检测模型为利用训练样本集中的样本图像训练得到的,所述样本图像标注有人体骨骼结构的骨骼关键点的实际位置信息、以及人体轮廓的轮廓关键点的实际位置信息。
该实施方式中,通过该训练方法的到的人体检测模型具有更高的检测精度,并通过 该人体检测模型能够得到兼顾表征精细度以及计算数据量的人体检测结果。
第二方面,本公开实施例还提供一种人体检测装置,包括:获取模块,用于获取待检测图像;检测模块,用于基于所述待检测图像,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息;生成模块,用于基于所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息,生成人体检测结果。
第三方面,本公开实施例还提供一种计算机设备,包括:处理器、非暂时性存储介质和总线,所述非暂时性存储介质存储有所述处理器可执行的机器可读指令,当计算机设备运行的情况下,所述处理器与所述存储介质之间通过总线通信,所述机器可读指令被所述处理器执行的情况下执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。
第四方面,本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行的情况下执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。
本公开实施例能够从待检测图像中,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息,并基于骨骼关键点的位置信息、以及轮廓关键点的位置信息,生成人体检测结果,在提升表征精细度的同时,兼顾计算数据量。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅出于说明目的示出了本公开的某些实施例,并不具有限制性,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。在附图中相同或相似的附图标记代表同一要素或等同要素,一旦某一附图标记在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
图1示出了本公开实施例所提供的一种人体检测方法的流程图。
图2a示出了本公开实施例所提供的一种轮廓关键点及骨骼关键点的位置示例。
图2b示出了本公开实施例所提供的一种主轮廓关键点及辅助轮廓关键点的位置示例。
图2c示出了本公开实施例所提供的另一种主轮廓关键点及辅助轮廓关键点的位置示例。
图2d示出了本公开实施例所提供的另一种主轮廓关键点及辅助轮廓关键点的位置示例。
图3示出了本公开实施例所提供的一种第一特征提取网络的结构示意图。
图4示出了本公开实施例所提供的特征提取方法的流程图。
图5示出了本公开实施例所提供的一种特征融合网络的结构示意图。
图6示出了本公开实施例所提供的特征融合方法的流程图。
图7示出了本公开实施例所提供的另一种特征融合网络的结构示意图。
图8示出了本公开实施例所提供的另一种特征融合方法的流程图。
图9a示出了本公开实施例所提供的一种使用散射卷积算子进行迭代更新过程的示意图。
图9b示出了本公开实施例所提供的一种使用聚集卷积算子进行迭代更新过程的示意图。
图10示出了本公开实施例所提供的另一种特征融合网络的结构示意图。
图11示出了本公开实施例所提供的另一种特征融合方法的流程图。
图12示出了本公开实施例所提供的骨骼关键点和轮廓关键点的示例。
图13示出了本公开实施例所提供的对二维特征矩阵中的元素进行位移变换的具体示例。
图14示出了本公开实施例所提供的一种第二特征提取网络的结构示意图。
图15示出了本公开实施例所提供的一种人体检测装置的示意图。
图16示出了本公开实施例所提供的一种计算机设备的示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下结合附图所提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
经研究发现,在进行人体检测时,通常有下述两种方式:骨骼关键点检测法和语义分割法。
骨骼关键点检测法;在该种方法中,通过神经网络模型从图像中提取人体的骨骼关键点,并基于骨骼关键点得到对应的人体检测结果;在该种人体检测方法中,其采用了简单的人体表示方法,具有更少的数据量,因而在基于该种方法得到的人体检测结果进行其他后续处理时,所需要耗费的计算量也较少;其更多的被用于人体的姿势、动作识别等领域;例如行为检测、基于人体姿态的人机交互等领域;但由于该种方法并不能提取到人体的轮廓信息,使得得到的人体检测结果表征精细度低。
语义分割法;在该种方法中,通过语义分割模型识别图像中每一个像素点属于人体的概率,并基于图像中各个像素点属于人体的概率,得到人体检测结果;在该种人体检测方法,虽然能够完整的得到人体的轮廓信息,但人体识别结果中所包含的计算数据量较大。
因此,一种能够兼顾表征精细度和计算数据量的人体检测方法成为当前亟待解决的问题。
基于上述研究,本公开提供了一种人体检测方法、装置、计算机设备及存储介质,能够对待检测图像进行特征提取以提取得到人体的骨骼特征和轮廓特征,并将提取得到的骨骼特征及轮廓特征进行特征融合,进而得到用于表征人体骨骼结构的骨骼关键点的位置信息,以及用于表征人体轮廓的轮廓关键点的位置信息。基于该种方法得到的人体检测结果,具有更少的数据量,而且反映了人体的骨骼特征和轮廓特征,兼顾提升表征精细度。
另外,本公开实施例中,由于是采用表征人体骨骼结构的骨骼关键点的位置信息,和表征人体轮廓的轮廓关键点的位置信息来得到人体检测结果,表征人体的信息更加丰富,具有更广阔的应用场景。
针对现有的人体检测方式所存在的缺陷,需要经过反复实践并仔细研究后才能确定,因此,对现有问题的发现过程以及本公开所提出的解决方案,都应该落入本公开的范围之内。
以下对根据本公开实施例的一种人体检测方法进行详细介绍,该人体检测方法可适用于具有数据处理能力的任意设备,例如计算机。
参见图1所示,为本公开实施例提供的人体检测方法的流程图,其中:
S101:获取待检测图像。
S102:基于待检测图像,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息。
S103:基于骨骼关键点的位置信息、以及轮廓关键点的位置信息,生成人体检测结果。
下面分别对上述S101~S103加以说明。
I:在上述S101中,待检测图像可以是,例如安装在目标位置的摄像头所拍摄得到的待检测图像,其他计算机设备发送的待检测图像,从本地数据库中读取的预先保存的待检测图像等。待检测图像中可以包括人体图像,也可以不包括人体图像;若待检测图像中包括人体图像,则能够基于本公开实施例提供的人体检测方法,得到最终的人体检测结果;若待检测图像中不包括人体图像,则得到的人体检测结果例如为空。
Ⅱ:在上述S102中,如图2a所示,骨骼关键点可以用于表征人体的骨骼特征,该骨骼特征包括人体的关节部位的特征。关节例如为肘关节、手腕关节、肩关节、颈关节、胯关节、膝关节、踝关节等。示例性的,还可以在人体头部设置骨骼关键点。
轮廓关键点可以用于表征人体的轮廓特征,其可以包括:主轮廓关键点,如图2a 所示,或者包括:主轮廓关键点和辅助轮廓关键点,如图2b~图2d所示;其中,图2b~图2d是图2a中线框内的部位的局部图。
其中,主轮廓关键点是表征人体关节部位轮廓的轮廓关键点,如图2a所示,例如肘关节的轮廓、腕关节的轮廓、肩关节的轮廓、颈关节的轮廓、胯关节的轮廓、膝关节的轮廓、踝关节的轮廓等,其一般与表征对应关节部位的骨骼关键点对应出现。
辅助轮廓关键点是表征人体关节部位之间轮廓的轮廓关键点,两个相邻主轮廓关键点之间的辅助轮廓关键点至少有一个;如图2b示出示例中,两个主轮廓关键点之间的辅助轮廓关键点有一个;如图2c示出示例中,两个主轮廓关键点之间的辅助轮廓关键点有两个;如图2d示出示例中,两个主轮廓关键点之间的辅助轮廓关键点有三个。
以上附图和文字描述中涉及到的骨骼关键点和轮廓关键点仅作为示例,以便于对本公开的理解。实际应用中,可以根据实际场景适当调整骨骼关键点和轮廓关键点的数量以及位置,本公开对此并不限定。
针对轮廓关键点包括:主轮廓关键点和辅助轮廓关键点的情况,可以采用下述方式基于待检测图像,确定用于表征人体轮廓的轮廓关键点的位置信息:
基于待检测图像,确定主轮廓关键点的位置信息;基于主轮廓关键点的位置信息,确定人体轮廓信息;基于确定的人体轮廓信息,确定多个辅助轮廓关键点的位置信息。
针对轮廓关键点包括主轮廓关键点的情况,则直接基于待检测图像,确定主轮廓关键点的位置信息即可。
本公开实施例提供一种基于待检测图像,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息的具体方法:
基于待检测图像,进行特征提取以获得骨骼特征及轮廓特征,并将得到的骨骼特征和轮廓特征进行特征融合;基于特征融合结果,确定骨骼关键点的位置信息、以及轮廓关键点的位置信息。
基于待检测图像,进行骨骼特征及轮廓特征提取,可以采用但不限于下述A或B中任意一种。
A:对待检测图像,进行一次特征提取,并对该次特征提取得到的骨骼特征以及轮廓特征进行特征融合。
B:对待检测图像,进行多次特征提取,并在每次进行特征提取后,对该次特征提取得到的骨骼特征及轮廓特征进行特征融合,并基于最后一次特征融合的特征融合结果,确定骨骼关键点的位置信息、以及轮廓关键点的位置信息。
以下将首先对A情况进行具体的描述。
在A情况下,基于该次特征融合的特征融合结果,确定用于表征人体骨骼结构的骨骼关键点的位置信息和用于表征人体轮廓的轮廓关键点的位置信息。
下面在a1和a2中分别对特征提取过程和特征融合过程加以说明。
a1:特征提取过程:
可以使用预先训练的第一特征提取网络从待检测图像中提取用于表征人体骨骼特征的骨骼关键点的第一目标骨骼特征矩阵;并提取用于表征人体轮廓特征的轮廓关键点的第一目标轮廓特征矩阵。
具体地,参见图3所示,本公开实施例提供一种第一特征提取网络的结构示意图。第一特征提取网络包括:共有特征提取网络、第一骨骼特征提取网络以及第一轮廓特征提取网络。
参见图4所示,本公开实施例还提供一种基于图3提供的第一特征提取网络从待检测图像中提取第一目标骨骼特征矩阵及第一目标轮廓特征矩阵的具体过程,包括如下步骤。
S401:使用共有特征提取网络对待检测图像进行卷积处理,得到包含骨骼特征以及轮廓特征的基础特征矩阵。
在具体实施中,待检测图像能够被表示为一图像矩阵;若待检测图像为单颜色通道图像,例如灰度图,则其能够被表示为一个二维图像矩阵;二维图像矩阵中的各个元素,与待检测图像的像素点一一对应;二维图像矩阵中各个元素的值,即为与各个元素对应的像素点的像素值。若待检测图像为多颜色通道图像,例如RGB格式的图像,则其能够被表示为一个三维图像矩阵;三维图像矩阵中,包括了三个与不同颜色(例如,R、G、B)通道一一对应的二维图像矩阵;任一二维图像矩阵中各个元素的值,即为与各个元素对应的像素点,在对应颜色通道下的像素值。
共有特征提取网络中包括了至少一层卷积层;将待检测图像的图像矩阵输入至共有特征提取网络后,使用共有特征提取网络对待检测图像的图像矩阵进行卷积处理,提取待检测图像中的特征。该种情况下,所提取到的特征既包含骨骼特征,又包含轮廓特征。
S402:使用第一骨骼特征提取网络对基础特征矩阵进行卷积处理,得到第一骨骼特征矩阵,并从第一骨骼特征提取网络中的第一目标卷积层获取第二骨骼特征矩阵;基于第一骨骼特征矩阵以及第二骨骼特征矩阵,得到第一目标骨骼特征矩阵;第一目标卷积层为第一骨骼特征提取网络中,除最后一层卷积层外的其他任一卷积层。
在具体实施中,第一骨骼特征提取网络包括了多层卷积层。多层卷积层依次连接,下一层卷积层的输入,为上一层卷积层的输出。具有该种结构的第一骨骼特征提取网络能够对基础特征矩阵进行多次卷积处理,并从最后一层卷积层得到第一骨骼特征矩阵。此处,第一骨骼特征矩阵为三维特征矩阵;在该三维特征矩阵中,包括了多个二维特征矩阵,且各个二维特征矩阵与预先确定的多个骨骼关键点一一对应。与某个骨骼关键点对应的二维特征矩阵中元素的值,表示与该元素对应的像素点属于该骨骼关键点的概率,且与一个元素对应的像素点一般有多个。
另外,通过多层卷积层对基础特征矩阵的多次卷积处理,虽然能够从基础特征矩阵中提取到人体的骨骼特征,但随着卷积次数的增加,会丢失待检测图像中的一些信息,这些信息里也可能包括人体的骨骼特征的相关信息;若待检测图像中的信息丢失过多,就可能会造成最终得到的用于表征人体骨骼特征的骨骼关键点的第一目标骨骼特征矩阵不够精确。因此,在本公开实施例中,还会从第一骨骼特征提取网络的第一目标卷积 层获取第二骨骼特征矩阵,并基于第一骨骼特征矩阵以及第二骨骼特征矩阵,得到第一目标骨骼特征矩阵。
这里,第一目标卷积层,为第一骨骼特征提取网络中,除最后一层卷积层外的其他任一卷积层。在图3的示例中,第一骨骼特征提取网络中的倒数第二层卷积层被选定作为第一目标卷积层。
例如可以采用下述方式基于第一骨骼特征矩阵以及第二骨骼特征矩阵,得到第一目标骨骼特征矩阵:
将第一骨骼特征矩阵以及第二骨骼特征矩阵进行拼接处理,得到第一拼接骨骼特征矩阵;对第一拼接骨骼特征矩阵进行维度变换处理,得到第一目标骨骼特征矩阵。
此处,对第一拼接骨骼特征矩阵进行维度变换处理的情况下,可以将其输入至维度变换神经网络,使用该维度变换神经网络对第一拼接骨骼特征矩阵进行至少一次卷积处理,得到第一目标骨骼特征矩阵。
此处,维度变换神经网络可以将第一骨骼特征矩阵及第二骨骼特征矩阵中携带的特征信息进行融合,使得得到的第一目标骨骼特征矩阵中,包含有更丰富的信息。
S403:使用第一轮廓特征提取网络,对基础特征矩阵进行卷积处理,得到第一轮廓特征矩阵,并从第一轮廓特征提取网络中的第二目标卷积层获取第二轮廓特征矩阵;基于第一轮廓特征矩阵以及第二轮廓特征矩阵,得到第一目标轮廓特征矩阵;第二目标卷积层为第一轮廓特征提取网络中,除最后一层卷积层外的其他任一卷积层。在图3的示例中,第一轮廓特征提取网络中的倒数第二层卷积层被选定作为第二目标卷积层。
在具体实施中,第一轮廓特征提取网络也包括了多层卷积层。多层卷积层依次连接,下一层卷积层的输入,为上一层卷积层的输出。具有该种结构的第一轮廓特征提取网络能够对基础特征矩阵进行多次卷积处理,并从最后一层卷积层得到第一轮廓特征矩阵。此处,第一轮廓特征矩阵为三维特征矩阵;在该三维特征矩阵中,包括了多个二维特征矩阵,且各个二维特征矩阵与预先确定的多个轮廓关键点一一对应。与某个轮廓关键点对应的二维特征矩阵中元素的值,表示与该元素对应的像素点属于该轮廓关键点的概率,且与一个元素对应的像素点一般有多个。
这里需要注意的是,轮廓关键点的数量和骨骼关键点的数量一般不同,因此,所得到的第一轮廓特征矩阵中所包括的二维特征矩阵的数量,与第一骨骼特征矩阵中所包括的二维特征矩阵的数量可以不同。
例如,若骨骼关键点的数量为14,轮廓关键点的数量为25个,则第一轮廓特征矩阵中所包括的二维特征矩阵数量为25个,第一骨骼特征矩阵中所包括的二维特征矩阵数量为14个。
另外,为了使得第一目标轮廓特征矩阵中,也包含有更加丰富的信息,可以采用如上述S402类似的方式,从第一轮廓特征提取网络中的第二目标卷积层获取第二轮廓特征矩阵,然后基于第一轮廓特征矩阵和第二轮廓特征矩阵,得到第一目标轮廓特征矩阵。
此处,基于第一轮廓特征矩阵和第二轮廓特征矩阵,得到第一目标轮廓特征矩阵的方式例如包括:
将第一轮廓特征矩阵以及第二轮廓特征矩阵进行拼接处理,得到第一拼接轮廓特征矩阵;对第一拼接轮廓特征矩阵进行维度变换处理,得到第一目标轮廓特征矩阵。
需要注意的是,上述S402和S403中,第一目标骨骼特征矩阵的维度与第一目标轮廓特征矩阵的维度相同、且第一目标骨骼特征矩阵与第一目标轮廓特征矩阵在相同维度上的维数相同,以便后续基于第一目标骨骼特征矩阵与第一目标轮廓特征矩阵进行特征融合处理。
例如,若第一目标骨骼特征矩阵的维度为3,且各个维度的维数分别为64、32和14,那么该第一目标骨骼特征矩阵的维数表示为64*32*14;第一目标轮廓特征矩阵的维数也可以表示为64*32*14。
另外,在另一种实施例中,还可以采用下述方式得到第一目标骨骼特征矩阵和第一目标轮廓特征矩阵:
使用共有特征提取网络对待检测图像进行卷积处理,得到包含骨骼特征以及轮廓特征的基础特征矩阵;
使用第一骨骼特征提取网络对基础特征矩阵进行卷积处理,得到第一骨骼特征矩阵,并对第一骨骼特征矩阵进行维度变换处理,得到第一目标骨骼特征矩阵;
使用第一轮廓特征提取网络对基础特征矩阵进行卷积处理,得到第一轮廓特征矩阵,并对第一轮廓特征矩阵进行维度变换处理,得到第一目标轮廓特征矩阵。
在该种方式中,也能够以较高的精度将人体的骨骼特征和轮廓特征从待检测图像中提取出来。
另外,本公开实施例中提供的第一特征提取网络是预先训练得到的。
这里,本公开实施例提供的人体检测方法通过人体检测模型实现;人体检测模型包括:第一特征提取网络和/或特征融合神经网络;
人体检测模型为利用训练样本集中的样本图像训练得到的,样本图像标注有人体骨骼结构的骨骼关键点的实际位置信息、以及人体轮廓的轮廓关键点的实际位置信息。
具体地,针对人体检测模型中包括第一特征提取网络的情况,第一特征提取网络可以单独训练,也可以与特征融合神经网络进行联合训练,也可以将单独训练和联合训练相结合。
训练得到第一特征提取网络的过程包括但不限于下述(1)和(2)所示。
(1)对第一特征提取网络进行单独训练例如包括:
步骤1.1:获取多张样本图像,以及每张样本图像的标注数据;标注数据包括:用于表征人体骨骼结构的骨骼关键点的实际位置信息、以及用于表征人体轮廓的轮廓关键点的实际位置信息;
步骤1.2:将多张样本图像输入第一基础特征提取网络中,得到第一样本目标骨骼 特征矩阵,以及第一样本目标轮廓特征矩阵;
步骤1.3:基于第一样本目标骨骼特征矩阵,确定骨骼关键点的第一预测位置信息;以及基于第一样本目标轮廓特征矩阵,确定轮廓关键点的第一预测位置信息;
步骤1.4:基于骨骼关键点的实际位置信息、以及骨骼关键点的第一预测位置信息,确定第一损失;以及基于轮廓关键点的实际位置信息、以及轮廓关键点的第一预测位置信息,确定第二损失;
步骤1.5:基于第一损失、以及第二损失,对第一基础特征提取网络进行本轮训练;
经过对第一基础特征提取网络的多轮训练,得到第一特征提取网络。
如图3所示,第一损失为图3中的LS1;第二损失为图3中的LC1。基于第一损失和第二损失,监督第一基础特征提取网络的训练,以得到较高精度的第一特征提取网络。
(2)将第一特征提取网络和特征融合神经网络进行联合训练例如包括:
步骤2.1:获取多张样本图像,以及每张样本图像的标注数据;标注数据包括:用于表征人体骨骼结构的骨骼关键点的实际位置信息、以及用于表征人体轮廓的轮廓关键点的实际位置信息;
步骤2.2:将多张样本图像输入第一基础特征提取网络中,得到第一样本目标骨骼特征矩阵,以及第一样本目标轮廓特征矩阵;
步骤2.3:使用基础特征融合神经网络对第一样本目标骨骼特征矩阵、以及第一样本目标轮廓特征矩阵进行特征融合,得到第二样本目标骨骼特征矩阵和第二样本目标轮廓特征矩阵。
步骤2.4:基于第二样本目标骨骼特征矩阵,确定骨骼关键点的第二预测位置信息;以及基于第二样本目标轮廓特征矩阵,确定轮廓关键点的第二预测位置信息;
步骤2.5:基于骨骼关键点的实际位置信息、以及骨骼关键点的第二预测位置信息,确定第三损失;以及基于轮廓关键点的实际位置信息、以及轮廓关键点的第二预测位置信息,确定第四损失;
步骤2.6:基于第三损失、以及第四损失,对第一基础特征提取网络、以及基础特征融合神经网络进行本轮训练;
经过对第一基础卷积神经网络和基础特征融合神经网络的多轮训练,得到第一特征提取网络和特征融合神经网络。
(3)将单独训练和联合训练相结合得到第一特征提取网络的过程,可以采用上述(1)和(2)中的过程进行同步训练。
或者还可以先采用(1)中的过程对第一特征提取网络进行预训练;将进行了预训练之后得到的第一特征提取网络,与特征融合神经网络进行上述(2)中的联合训练。
需要注意的是,对第一特征提取网络进行单独训练和联合训练,所采用的样本图像可以相同也可以不同。
在对第一特征提取网络和特征融合神经网络进行联合训练之前,也可以先对特征融合神经网络进行预训练,然后采用进行了预训练的特征融合神经网络,与第一特征提取网络进行联合训练。
对特征融合神经网络进行单独训练的详细过程,可以参见下述a2示出的实施例的描述。
a2:特征融合过程:
在得到用于表征人体骨骼特征的骨骼关键点的第一目标骨骼特征矩阵,和用于表征人体轮廓特征的轮廓关键点的第一目标轮廓特征矩阵后,就可以基于第一目标骨骼特征矩和第一目标轮廓特征矩阵进行特征融合处理。
具体地,基于待检测图像进行骨骼特征和轮廓特征提取的过程中,虽然所使用的基础特征矩阵是同一个,但是第一骨骼特征提取网络是从基础特征矩阵中提取骨骼特征,而第一轮廓特征提取网络是从基础特征矩阵中提取轮廓特征。两个过程相互独立而存在。但是针对同一人体而言,轮廓特征和骨骼特征之间是具有相互的关联关系的;将轮廓特征和骨骼特征进行融合的目的,是要利用骨骼特征和轮廓特征之间的相互影响关系。例如,可以基于轮廓特征,对最终提取到的骨骼关键点的位置信息进行修正,并基于骨骼特征,对最终提取到的轮廓关键点的位置信息进行修正,进而得到更加准确的骨骼关键点的位置信息和轮廓关键点的位置信息,以得到更高精度的人体检测结果。
本公开实施例提供一种将提取得到的骨骼特征和轮廓特征进行特征融合的具体方法,包括:使用预先训练的特征融合神经网络对第一目标骨骼特征矩阵、以及第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵。
其中,第二目标骨骼特征矩阵为三维骨骼特征矩阵,该三维骨骼特征矩阵包括与各个骨骼关键点分别对应的二维骨骼特征矩阵;二维骨骼特征矩阵中每个元素的值,表征与该元素对应的像素点属于对应骨骼关键点(即,该二维骨骼特征矩阵对应的骨骼关键点)的概率;第二目标轮廓特征矩阵为三维轮廓特征矩阵,该三维轮廓特征矩阵包括与各个轮廓关键点分别对应的二维轮廓特征矩阵;二维轮廓特征矩阵中每个元素的值,表征与该元素对应的像素点属于对应轮廓关键点的概率。
本公开实施例中提供的特征融合神经网络可以单独训练,也可以与第一特征提取网络进行联合训练,也可以将单独训练和联合训练相结合。
将特征融合神经网络与第一特征提取网络进行联合训练的过程,可以参见上述(2),在此不再赘述。
针对不同结构的特征融合神经网络,在对其进行单独训练的情况下,所用的训练方法也会有所区别,针对不同结构的特征融合神经网络的训练方法,可以参见下述M1~M3。
对骨骼特征和轮廓特征进行特征融合的过程可以包括但不限于下述M1~M3中至少一种。
M1:
参见图5所示,本公开实施例提供一种特征融合神经网络的具体结构,包括:第一卷积神经网络、第二卷积神经网络、第一变换神经网络、以及第二变换神经网络。
参见图6所示,本公开实施例还提供一种基于图5提供的特征融合神经网络,对第一目标骨骼特征矩阵、以及第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵的具体方法,包括以下步骤。
S601:使用第一卷积神经网络对第一目标骨骼特征矩阵进行卷积处理,得到第一中间骨骼特征矩阵。执行S603。
此处,第一卷积神经网络包括至少一层卷积层。若第一卷积神经网络有多层,则多层卷积层依次连接;本层卷积层的输入为上一层卷积层的输出。将第一目标骨骼特征矩阵输入至第一卷积神经网络,使用各卷积层对第一目标骨骼特征矩阵进行卷积处理,以得到第一中间骨骼特征矩阵。
该过程是为了能够进一步的将骨骼特征从第一目标骨骼特征矩阵中提取出来。
S602:使用第二卷积神经网络对第一目标轮廓特征矩阵进行卷积处理,得到第一中间轮廓特征矩阵。执行S604。
此处,该处理过程与上述S601类似,在此不再赘述。
需要注意的是,S601和S602的执行无先后顺序。可以同步执行,也可以异步执行。
S603:将第一中间轮廓特征矩阵与第一目标骨骼特征矩阵进行拼接处理,得到第一拼接特征矩阵;并使用第一变换神经网络对第一拼接特征矩阵进行维度变换,得到第二目标骨骼特征矩阵。
这里,将第一中间轮廓特征矩阵与第一目标骨骼特征矩阵进行拼接处理,得到第一拼接特征矩阵,使得得到的第一拼接特征矩阵中,既包括了轮廓特征,又包括了骨骼特征。
使用第一变换神经网络,对第一拼接矩阵进行进一步的维度变换,实际上是使用第一变换神经网络再次从第一拼接特征矩阵中提取骨骼特征;由于在得到第一拼接特征矩阵的过程,去除了待检测图像中除骨骼特征和轮廓特征以外的其他特征,仅包括了骨骼特征和轮廓特征,因而基于第一拼接特征矩阵所得到的第二目标骨骼特征矩阵中所包含的骨骼特征,会受到轮廓特征的影响,能够建立骨骼特征和轮廓特征之间的相互联系,可以实现骨骼特征和轮廓特征的融合。
S604:将第一中间骨骼特征矩阵与第一目标轮廓特征矩阵进行拼接处理,得到第二拼接特征矩阵,并使用第二变换神经网络对第二拼接特征矩阵进行维度变换,得到第二目标轮廓特征矩阵。
这里,将第一中间骨骼特征矩阵与第一目标轮廓特征矩阵进行拼接处理,得到第二拼接特征矩阵的过程,与上述S602种得到第一拼接特征矩阵的过程类似,在此不再赘述。
同样的,第二目标轮廓特征矩阵所包含的轮廓特征,会受到骨骼特征的影响, 建立了骨骼特征和轮廓特征之间的相互联系,实现了骨骼特征和轮廓特征的融合。
另一种实施例中,可以采用下述方式对特征融合神经网络进行单独训练。
步骤3.1:获取多张样本图像的第一样本目标骨骼特征矩阵、以及第一样本目标轮廓特征矩阵。
获取方式与上述实施例中获取第一目标骨骼特征矩阵、第一目标轮廓特征矩阵的方式类似,在此不再赘述。可以在与第一特征提取网络进行联合训练的情况下获取,也可以使用预训练的第一特征提取网络获取。
步骤3.2:使用第一基础卷积神经网络对第一样本目标骨骼特征矩阵进行卷积处理,得到第一样本中间骨骼特征矩阵。
步骤3.3:使用第二基础卷积神经网络对第一样本目标轮廓特征矩阵进行卷积处理,得到第一样本中间轮廓特征矩阵。
步骤3.4:将第一样本中间轮廓特征矩阵与第一样本目标骨骼特征矩阵进行拼接处理,得到第一样本拼接特征矩阵;并使用第一基础变换神经网络对第一样本拼接特征矩阵进行维度变换,得到第二样本目标骨骼特征矩阵。
步骤3.5:将第一样本中间骨骼特征矩阵与第一样本目标轮廓特征矩阵进行拼接处理,得到第二样本拼接特征矩阵,并使用第二基础变换神经网络对第二样本拼接特征矩阵进行维度变换,得到第二样本目标轮廓特征矩阵。
步骤3.6:基于第二样本目标骨骼特征矩阵,确定骨骼关键点的第三预测位置信息;以及基于第二样本目标轮廓特征矩阵,确定轮廓关键点的第三预测位置信息。
步骤3.7:基于骨骼关键点的实际位置信息、以及骨骼关键点的第三预测位置信息,确定第五损失;以及基于轮廓关键点的实际位置信息、以及轮廓关键点的第三预测位置信息,确定第六损失。
步骤3.8:基于第五损失、以及第六损失,对第一基础卷积神经网络、第二基础卷积神经网络、第一基础变换神经网络、以及第二基础变换神经网络进行本轮训练;
经过对第一基础卷积神经网络、第二基础卷积神经网络、第一基础变换神经网络、以及第二基础变换神经网络的多轮训练,得到特征融合神经网络。
此处,第五损失为图5中的LS2;第六损失为图5中的LC2。
M2:
参见图7所示,本公开实施例提供的另一种特征融合神经网络的具体结构,包括:第一定向卷积神经网络、第二定向卷积神经网络、第三卷积神经网络、第四卷积神经网络、第三变换神经网络、以及第四变换神经网络。
参见图8所示,本公开实施例还提供一种基于图7提供的特征融合神经网络,对第一目标骨骼特征矩阵、以及第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵的具体方法,包括以下步骤。
S801:使用第一定向卷积神经网络对第一目标骨骼特征矩阵进行定向卷积处理, 得到第一定向骨骼特征矩阵。使用第三卷积神经网络对第一定向骨骼特征矩阵进行卷积处理,得到第二中间骨骼特征矩阵。执行S804。
S802:使用第二定向卷积神经网络对第一目标轮廓特征矩阵进行定向卷积处理,得到第一定向轮廓特征矩阵;并使用第四卷积神经网络对第一定向轮廓特征矩阵进行卷积处理,得到第二中间轮廓特征矩阵。执行S803。
S803:将第二中间轮廓特征矩阵与第一目标骨骼特征矩阵进行拼接处理,得到第三拼接特征矩阵;并使用第三变换神经网络对第三拼接特征矩阵进行维度变换,得到第二目标骨骼特征矩阵。
S804:将第二中间骨骼特征矩阵与第一目标轮廓特征矩阵进行拼接处理,得到第四拼接特征矩阵,并使用第四变换神经网络对第四拼接特征矩阵进行维度变换,得到第二目标轮廓特征矩阵。
在具体实施中,在将骨骼特征和轮廓特征进行特征融合的过程中,由于骨骼关键点通常集中在人体的骨架,而轮廓关键点则集中在人体的轮廓,也即分布在骨架周围。因此需要针对骨骼特征和轮廓特征分别进行局部空间变换。例如,将骨骼特征变换至轮廓特征在轮廓特征矩阵中的位置,并将轮廓特征变换至骨骼特征在骨骼特征矩阵中的位置,以更好的提取出骨骼特征和轮廓特征,实现骨骼特征和轮廓特征的融合。
为了实现该目的,本公开实施例首先使用第一定向卷积神经网络对第一目标骨骼特征矩阵进行定向卷积处理;该定向卷积能够有效的在特征层面实现骨骼特征的定向空间变换。然后使用第三卷积神经网络对得到的第一定向骨骼特征矩阵进行卷积处理,得到第二中间骨骼特征矩阵。该种情况下,由于已经通过第一定向卷积层对骨骼特征进行了定向空间变换,因此骨骼特征实际上是向轮廓特征方向发生了移动。然后,将第二中间骨骼特征矩阵和第一目标轮廓特征矩阵进行拼接处理,得到第四拼接特征矩阵。第四拼接特征矩阵在包括轮廓特征的同时,还包括了进行了定向空间变换的骨骼特征。然后使用第四变换神经网络对第四拼接特征矩阵进行维度变换,也即从第四拼接特征矩阵中,再一次提取轮廓特征。以这种方式得到的第二目标轮廓特征矩阵会受到骨骼特征的影响,实现了骨骼特征和轮廓特征之间的融合。
同理,本公开实施例首先使用第二定向卷积神经网络对第一目标轮廓特征矩阵进行定向卷积处理,该定向卷积能够有效的在特征层面实现轮廓特征的定向空间变换。然后使用第四卷积神经网络对得到的第一定向轮廓特征矩阵进行卷积处理,得到第二中间轮廓特征矩阵。该种情况下,由于已经通过第二定向卷积层对轮廓特征进行了定向空间变换,因此轮廓特征实际上向骨骼特征方向发生了移动。然后,对第二中间轮廓特征矩阵和第一目标骨骼特征矩阵进行拼接处理,得到第三拼接特征矩阵。第三拼接特征矩阵在包括骨骼特征的同时,还包括了进行了定向空间变换的轮廓特征。然后使用第三变换神经网络对第三拼接特征矩阵进行维度变换,也即从第三拼接特征矩阵中,再一次提取骨骼特征。以这种方式得到的第二目标骨骼特征矩阵会受到轮廓特征的影响,实现了骨骼特征和轮廓特征之间的融合。
具体地,定向卷积由多次迭代卷积步骤组成,有效的定向卷积满足下述要求:
(1)在每次迭代卷积步骤中,仅更新特征矩阵中的一组元素的元素值;
(2)在最后一次迭代卷积步骤之后,所有元素的元素值应当只更新一次。
以对第一目标骨骼特征矩阵进行定向卷积为例,为了实现定向卷积过程,可以定义一特征函数序列
Figure PCTCN2020087826-appb-000001
用于控制元素的更新顺序。其中,函数F k的输入是第一目标骨骼特征矩阵中各元素的位置,而函数F k的输出表示是否更新第k次迭代中的元素。该输出可以是1或0;1代表更新,0代表不更新。具体而言,在第k次迭代过程中,只更新F k=1的区域中元素的元素值,而保持其他区域中元素的元素值不变。第i次迭代的更新可以表示为:
T i(X)=F i·(W×T i-1(X)+b)+(1-F i)·T i-1(X)。
其中,T 0(X)=X,X表示定向卷积的输入,也即第一目标骨骼特征矩阵;W和b分别表示多次迭代过程中的共享权重和偏差。
为了实现骨骼特征和轮廓特征的融合,可以设定一对对称的定向卷积算子,也即上述特征函数序列
Figure PCTCN2020087826-appb-000002
分别为散射卷积算子F i S,和聚集卷积算子F i G。其中,散射卷积算子负责由内向外依次更新特征矩阵中的元素;而聚集卷积算子由外向内依次更新特征矩阵中的元素。
在使用第一定向卷积神经网络对第一目标骨骼特征矩阵进行定向卷积处理的情况下,由于要将骨骼特征元素定向空间变换至该元素周围的位置(与轮廓特征更相关的位置),因此使用散射卷积算子F i S;在使用第二定向卷积神经网络对第一目标轮廓特征矩阵进行定向卷积处理的情况下,由于要将轮廓特征元素定向空间变换至轮廓特征矩阵中间的位置(与骨骼特征更相关的位置),因此使用聚集卷积算子F i G
具体地,第一定向卷积神经网络对第一目标骨骼特征矩阵进行定向卷积处理过程如下。
将第一目标骨骼特征矩阵划分为多个子矩阵,每个子矩阵被称为一个网格;其中,若第一目标骨骼特征矩阵为三维矩阵,三个维度的维数分别为:m、n、s,则第一目标骨骼特征矩阵的维数被表示为m*n*s;若网格的大小为5,也即,每个网格的维数均可以被表示为5*5*s。
然后针对每个网格,使用散射卷积算子F i S进行多次迭代卷积,得到目标子矩阵。如图9a所示,提供了一种使用散射卷积算子F i S对网格大小为5的子矩阵中元素的元素值进行两次迭代更新的过程。其中,图9a中a表示原始子矩阵;b表示进行了一次迭代得到的子矩阵,c表示进行两次迭代得到的子矩阵,也即目标子矩阵。
将各个网格对应的目标子矩阵拼接在一起,得到第一定向骨骼特征矩阵。
类似的,第二定向卷积神经网络对第一目标轮廓特征矩阵进行定向卷积处理的过程如下。
将第一目标轮廓特征矩阵划分为多个子矩阵,每个子矩阵被称为一个网格;其中,若第一目标轮廓特征矩阵为三维矩阵,三个维度的维数分别为:m、n、s,则第一目标轮廓特征矩阵的维数被表示为m*n*s;若网格的尺寸大小为5,也即,每个网格的维数均可以被表示为5*5*s。
然后针对每个网格,使用聚集卷积算子F i G进行多次迭代卷积,得到目标子矩阵。
如图9b所示,提供了一种使用聚集卷积算子F i G对网格大小为5的子矩阵中元素的元素值进行两次迭代更新的过程。其中,图9b中a表示原始子矩阵;b表示进行了一次迭代得到的子矩阵,c表示进行两次迭代得到的子矩阵,也即目标子矩阵。
将各个网格对应的目标子矩阵拼接在一起,得到第一定向轮廓特征矩阵。
这里需要注意的是,各个子矩阵的迭代卷积过程可以并行处理。
图9a和图9b中的示例,仅仅是使用散射卷积算子F i S和聚集卷积算子F i G对子矩阵中元素的元素值进行迭代更新的示例。
另一种实施例中,可以采用下述方式对特征融合神经网络进行单独训练。
步骤4.1:获取多张样本图像的第一样本目标骨骼特征矩阵、以及第一样本目标轮廓特征矩阵。
获取方式与上述实施例中获取第一目标骨骼特征矩阵、第一目标轮廓特征矩阵的方式类似,在此不再赘述。可以在与第一特征提取网络进行联合训练的情况下获取,也可以使用预训练的第一特征提取网络获取。
步骤4.2:使用第一基础定向卷积神经网络对第一样本目标骨骼特征矩阵进行定向卷积处理,得到第一样本定向骨骼特征矩阵;使用第一样本定向骨骼特征矩阵,以及轮廓关键点的实际位置信息,得到第七损失。并基于第七损失,对第一基础定向卷积神经网络进行本轮训练。
此处,第七损失为图7中的LC3。
这里,使用第一基础定向卷积神经网络对第一样本目标骨骼特征矩阵进行定向卷积处理,也即将第一样本目标骨骼特征矩阵进行定向空间变换。该种情况下,要使得得到的第一样本定向骨骼特征矩阵表征的关键点的位置信息,尽可能的与轮廓关键点的位置信息保持一致。因此要基于第一样本定向骨骼特征矩阵,以及轮廓关键点的实际位置信息,得到第七损失,使用第七损失,监督对第一基础定向卷积神经网络的训练。
步骤4.3:使用第二基础定向卷积神经网络对第一样本目标轮廓特征矩阵进行定 向卷积处理,得到第一样本定向轮廓特征矩阵;使用第一样本定向轮廓特征矩阵,以及骨骼关键点的实际位置信息,得到第八损失。并基于第八损失,对第二基础定向卷积神经网络进行本轮训练。
此处,第八损失为图7中的LS3。
步骤4.4:使用第四基础卷积神经网络对第一样本定向轮廓特征矩阵进行卷积处理,得到第二样本中间轮廓特征矩阵;以及将得到的第二样本中间轮廓特征矩阵与第一样本目标骨骼特征矩阵进行拼接处理,得到第三样本拼接特征矩阵,并使用第三基础变换神经网络对第三样本拼接特征矩阵进行维度变换,得到第二样本目标骨骼特征矩阵。
步骤4.5:基于第二样本目标骨骼特征矩阵确定骨骼关键点的第四预测位置信息;基于骨骼关键点的实际位置信息、以及骨骼关键点的第四预测位置信息,确定第九损失。
此处,第九损失为图7中LS4。
步骤4.6:使用第三基础卷积神经网络对第一样本定向骨骼特征矩阵进行卷积处理,得到第二样本中间骨骼特征矩阵;以及将得到的第二样本中间骨骼特征矩阵与第一样本目标轮廓特征矩阵进行拼接处理,得到第四样本拼接特征矩阵,并使用第四基础变换神经网络对第四样本拼接特征矩阵进行维度变换,得到第二样本目标轮廓特征矩阵。
步骤4.7:基于第二样本目标轮廓特征矩阵确定轮廓关键点的第四预测位置信息;基于轮廓关键点的实际位置信息、以及轮廓关键点的第四预测位置信息,确定第十损失。
此处,第十损失为图7中LC4。
步骤4.8:基于第九损失和第十损失,对第三基础卷积神经网络、第四基础卷积神经网络、第三基础变换神经网络、以及第四基础变换神经网络进行本轮训练。
经过对第一基础定向卷积神经网络、第二基础定向卷积神经网络、第三基础卷积神经网络、第四基础卷积神经网络、第三基础变换神经网络、以及第四基础变换神经网络进行多轮训练,得到训练好的特征融合神经网络。
M3:
参见图10所示,本公开实施例提供的另一种特征融合神经网络的具体结构,包括:位移估计神经网络、第五变换神经网络。
参见图11所示,本公开实施例还提供一种基于图10提供的特征融合神经网络,对第一目标骨骼特征矩阵、以及第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵的具体方法,包括以下步骤。
S1101:对第一目标骨骼特征矩阵和第一目标轮廓特征矩阵进行拼接处理,得到第五拼接特征矩阵。
S1102:将第五拼接特征矩阵输入至位移估计神经网络中,对预先确定的多组关键点对进行位移估计,得到每组关键点对中的一个关键点移动至另一关键点的位移信息;其中,每个关键点对中的两个关键点位置相邻,该两个关键点包括一骨骼关键点和一轮廓关键点,或者包括两个骨骼关键点,或者包括两个轮廓关键点。
在具体实施中,会预先为人体确定多个骨骼关键点和多个轮廓关键点。如图12所示,提供一种预先为人体确定的多个骨骼关键点和轮廓关键点的示例。在该示例中,骨骼关键点有14个,以图12中较大的圆点分别表示:头顶、脖子、两肩、双肘、双腕、双胯部、双膝、以及双踝;轮廓关键点有26个,以图12中较小的圆点表示。除了表征人体头顶的骨骼关键点外,其他每个骨骼关键点都会对应有两个轮廓关键点。其中,双跨的骨骼关键点与同一轮廓关键点对应。
位置相邻的两个关键点能够构成一个关键点对。如图12中,每两个通过线段直接连接的关键点能够构成一个关键点对。也即,关键点对的构成可能出现下述三种情况:(骨骼关键点、骨骼关键点)、(轮廓关键点、轮廓关键点)、(骨骼关键点、轮廓关键点)。
位移估计神经网络包括多层卷积层,多层卷积层依次连接,用于对第五拼接特征矩阵中的骨骼特征和轮廓特征进行特征学习,得到每个关键点对中的一个关键点移动至另一个关键点的位移信息。与每一个关键点对应的位移信息有两组。
例如,若关键点对为(P、Q),其中P和Q分别表示一个关键点。则该关键点对的位移信息包括:从P移动至Q的位移信息,和从Q移动至P的位移信息。
每一组位移信息均包括移动方向和移动距离。
S1103:将每组关键点对中的每个关键点分别作为当前关键点,从与该当前关键点配对的另一关键点对应的三维特征矩阵中,获取与配对的另一关键点对应的二维特征矩阵;其中,若配对的另一关键点为骨骼关键点,则该骨骼关键点对应的三维特征矩阵为第一骨骼特征矩阵;若配对的另一关键点为轮廓关键点,则该轮廓关键点对应的三维特征矩阵为第一轮廓特征矩阵。
S1104:根据从配对的另一关键点到当前关键点的位移信息,对配对的另一关键点对应的二维特征矩阵中的元素进行位置变换,得到与该当前关键点对应的位移特征矩阵。
此处,仍然以关键点对(P、Q)为例,首先将P作为当前关键点,并从Q对应的三维特征矩阵中,获取与Q对应的二维特征矩阵。
这里,若Q为骨骼关键点,则Q对应的三维特征矩阵为第一骨骼特征矩阵(见上述S402)。若Q为轮廓关键点,则Q对应的三维特征矩阵为第一轮廓特征矩阵(见上述S403)。
这里,在Q为骨骼关键点的情况下,将第一骨骼特征矩阵作为Q的三维特征矩阵,并从第一骨骼特征矩阵中,得到Q的二维特征矩阵。这是由于第一骨骼特征矩阵中,仅包括了骨骼特征,能够使得后续处理过程中学习到的骨骼特征更加的有针对性。同理,在Q为轮廓关键点的情况下,将第一轮廓特征矩阵作为Q的三维特征矩阵,并从第一轮廓特征矩阵中得到Q的二维特征矩阵。这是由于第一轮廓特征矩阵中仅包括了轮廓特征,使得后续处理过程中学习到的轮廓特征更具有针对性。
在得到Q的二维特征矩阵后,基于从Q移动至P的位移信息,对Q的二维特征矩阵中的元素进行位置变换,得到P对应的位移特征矩阵。
例如图13所示,若从Q移动至P的位移信息为(2,3)其中,2表示在第一维度上移动的距离为2,3表示在第二维度上移动的距离为3,则Q的二维特征矩阵如图13中a所示;对Q的二维特征矩阵中的元素进行位置变换后,得到的与P对应的位移特征矩阵如图13中b所示。这里仅以数字来进行位移信息的相对表示,在实际实施中,应当结合具体方案来理解位移信息,例如,位移信息“2”可以指2个元素、2个单元格等等。
然后在将Q作为当前关键点,并从P对应的三维特征矩阵中,获取P对应的二维特征矩阵。然后基于从P移动至Q的位移信息,对P的二维特征矩阵中的元素进行位置变换,得到Q对应的位移特征矩阵Q。
如此,能够生成每个骨骼关键点对应的位移特征矩阵,和每个轮廓关键点对应的位移特征矩阵。
这里需要注意的是,每个骨骼关键点,可能会与多个关键点分别成对,因此,得到的每个骨骼关键点的位移特征矩阵也可能会有多个;每个轮廓关键点,也可能会与多个关键点分别成对,因此得到的每个轮廓关键点的位移特征矩阵也可能会有多个。且不同的轮廓关键点,对应的位移特征矩阵的数量可能不同;不同的骨骼关键点,对应的位移特征矩阵的数量也可能会有所不同。
S1105:针对每个骨骼关键点,将该骨骼关键点对应的二维特征矩阵,与该骨骼关键点对应的各个位移特征矩阵进行拼接处理,得到该骨骼关键点的拼接二维特征矩阵;并将该骨骼关键点的拼接二维特征矩阵输入至第五变换神经网络,得到与该骨骼关键点对应的目标二维特征矩阵;基于各个骨骼关键点分别对应的目标二维特征矩阵,生成第二目标骨骼特征矩阵。
S1106:针对每个轮廓关键点,将该轮廓关键点对应的二维特征矩阵,与该轮廓关键点对应的各个位移特征矩阵进行拼接处理,得到该轮廓关键点的拼接二维特征矩阵;并将该轮廓关键点的拼接二维特征矩阵输入至第五变换神经网络,得到与该轮廓关键点对应的目标二维特征矩阵;基于各个轮廓关键点分别对应的目标二维特征矩阵,生成第二目标轮廓特征矩阵。
例如,若P为骨骼关键点,且P对应的二维特征矩阵为P’,P位于三个关键点对中,则基于上述过程,能够得到P的三个位移特征矩阵,分别为P1’、P2’、和P3’,则将P’、P1’、P2’、和P3’进行拼接,得到P的拼接二维特征矩阵。该种情况下,P的三个位移特征矩阵中,可能既有对骨骼关键点对应的二维特征矩阵中的元素进行位置变换得到的,也有对轮廓关键点对应的二维特征矩阵中的元素进行位置变换得到的。因此,将P’、P1’、P2’、和P3’进行拼接,使得与P在位置上相邻的各个关键点的特征融合在一起。再使用第五变换神经网络对P的拼接二维特征矩阵进行卷积处理,使得得到的P的目标二维特征矩阵既包含了骨骼特征,又包含了轮廓特征,实现了骨骼特征和轮廓特征的融合。
同理,若P为轮廓关键点,也能够基于上述过程,实现骨骼特征和轮廓特征的融合。
另一种实施例中,可以采用下述方式对特征融合神经网络进行单独训练。
步骤5.1:获取多张样本图像的第一样本目标骨骼特征矩阵、以及第一样本目标轮廓特征矩阵。
获取方式与上述实施例中获取第一目标骨骼特征矩阵、第一目标轮廓特征矩阵的方式类似,在此不再赘述。可以在与第一特征提取网络进行联合训练的情况下获取,也可以使用预训练的第一特征提取网络获取。
步骤5.2:对第一样本目标骨骼特征矩阵和第一样本目标轮廓特征矩阵进行拼接处理,得到第五样本拼接特征矩阵。
步骤5.3:将第五样本拼接特征矩阵输入至基础位移估计神经网络中,对预先确定的多组关键点对进行位移估计,得到每组关键点对中的一个关键点移动至另一关键点的预测位移信息;其中,每个关键点对中的两个关键点位置相邻,该两个关键点包括一骨骼关键点和一轮廓关键点,或者包括两个骨骼关键点,或者包括两个轮廓关键点。
步骤5.4:将每组关键点对中的每个关键点分别作为当前关键点,从与该当前关键点配对的另一关键点对应的样本三维特征矩阵中,获取与配对的另一关键点对应的样本二维特征矩阵。
步骤5.5:根据从配对的另一关键点到当前关键点的预测位移信息,对配对的另一关键点对应的样本二维特征矩阵中的元素进行位置变换,得到与该当前关键点对应的样本位移特征矩阵。
步骤5.6:根据当前关键点对应的样本位移特征矩阵,以及与当前关键点对应的样本二维特征矩阵,确定位移损失。
步骤5.7:基于位移损失,对位移估计神经网络进行本轮训练。
步骤5.8:针对每个骨骼关键点,将该骨骼关键点对应的样本二维特征矩阵,与该骨骼关键点对应的各个样本位移特征矩阵进行拼接处理,得到该骨骼关键点的样本拼接二维特征矩阵;并将该骨骼关键点的样本拼接二维特征矩阵输入至第五基础变换神经网络,得到与该骨骼关键点对应的样本目标二维特征矩阵;基于各个骨骼关键点分别对应的样本目标二维特征矩阵,生成第二样本目标骨骼特征矩阵。
步骤5.9:针对每个轮廓关键点,将该轮廓关键点对应的样本二维特征矩阵,与该轮廓关键点对应的各个样本位移特征矩阵进行拼接处理,得到该轮廓关键点的样本拼接二维特征矩阵;并将该轮廓关键点的样本拼接二维特征矩阵输入至第五基础变换神经网络,得到与该轮廓关键点对应的样本目标二维特征矩阵;基于各个轮廓关键点分别对应的样本目标二维特征矩阵,生成第二样本目标轮廓特征矩阵。
步骤5.10:基于第二样本目标骨骼特征矩阵、第二样本目标轮廓特征矩阵、骨骼关键点的实际位置信息、以及轮廓关键点的实际位置信息,确定变换损失。例如,可以基于第二样本目标骨骼特征矩阵确定骨骼关键点的预测位置信息,基于第二样本目标轮廓特征矩阵确定轮廓关键点的预测位置信息。基于骨骼关键点的预测位置信息、实际位置信息,以及轮廓关键点的预测位置信息、实际位置信息,来确定变换损失。
步骤5.11:基于变换损失,对第五基础变换神经网络进行本轮训练。
步骤5.12:经过对基础位移估计神经网络、第五基础变换神经网络的多轮训练,得到特征融合神经网络。
B:对待检测图像,进行多次特征提取,并在每次进行特征提取后,对该次特征提取得到的骨骼特征及轮廓特征进行特征融合,并基于最后一次特征融合的特征融合结果,确定骨骼关键点的位置信息、以及轮廓关键点的位置信息。
在进行多次特征提取的情况下,基于第i次特征融合的特征融合结果进行第i+1次特征提取,i为正整数。
在B中,进行第一次特征提取的过程,与上述A中对待检测图像提取骨骼特征和轮廓特征的过程一致,在此不再赘述。
在B中进行除第一次特征提取外的其他各次特征提取的具体过程,包括:
使用第二特征提取网络从上一次特征融合的特征融合结果中,提取用于表征人体骨骼特征的骨骼关键点的第一目标骨骼特征矩阵;并提取用于表征人体轮廓特征的轮廓关键点的第一目标轮廓特征矩阵;
其中,第一特征提取网络和第二特征提取网络的网络参数不同,且不同次的特征提取使用的第二特征提取网络的网络参数不同。
这里,第一特征提取网络和第二特征提取网络均包括多层卷积层。第一特征提取网络和第二特征提取网络的网络参数例如包括但不限于:卷积层的数量、每一层卷积层使用的卷积核的大小、每一层卷积层使用的卷积核的数量等。
参见图14所示,本公开实施例提供一种第二特征提取网络的结构示意图。第二特征提取网络包括:第二骨骼特征提取网络、以及第二轮廓特征提取网络。
使用该第二特征提取网络进行本次特征提取的上一次特征融合的特征融合结果包括:第二目标骨骼特征矩阵和第二目标轮廓特征矩阵;具体得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵的过程参见上述A所示,在此不再赘述。
使用该第二特征提取网络从上一次特征融合的特征融合结果中,提取用于表征人体骨骼特征的骨骼关键点的第一目标骨骼特征矩阵;并提取用于表征人体轮廓特征的轮廓关键点的第一目标轮廓特征矩阵的具体过程例如为:
使用第二骨骼特征提取网络对上一次特征融合得到的第二目标骨骼特征矩阵进行卷积处理,得到第三骨骼特征矩阵,并从第二骨骼特征提取网络中的第三目标卷积层获取第四骨骼特征矩阵;基于第三骨骼特征矩阵以及第四骨骼特征矩阵,得到第五目标骨骼特征矩阵。其中,第三目标卷积层为第二骨骼特征提取网络中,除最后一层卷积层外的其他任一卷积层。
使用第二轮廓特征提取网络对上一次特征融合得到的第二目标轮廓特征矩阵进行卷积处理,得到第三轮廓特征矩阵,并从第二轮廓特征提取网络中的第四目标卷积层获取第四轮廓特征矩阵;基于第三轮廓特征矩阵以及第四轮廓特征矩阵,得到第六目标轮廓特征矩阵。第四目标卷积层为第二轮廓特征提取网络中,除最后一层卷积层外的其他 任一卷积层。
具体的处理方式与上述A中使用第一骨骼特征提取网络从待检测图像中提取第一目标骨骼特征矩阵及第一目标轮廓特征矩阵的具体过程类似,在此不再赘述。
以上实施例对于上述Ⅱ中确定骨骼关键点以及轮廓关键点的位置信息的方式进行了描述。
Ⅲ:在基于上述Ⅱ得到骨骼关键点的位置信息和轮廓关键点的位置信息后,可将各个骨骼关键点的位置,以及轮廓关键点的位置从待检测图像中确定出来。然后可以生成人体检测结果。
人体检测结果包括下述一种或者多种:包括骨骼关键点标记、以及轮廓关键点标记的待检测图像;包括骨骼关键点的位置信息以及轮廓关键点的位置信息的数据组。
后续,还可以基于人体检测结果,执行下述操作中一种或者多种:人体动作识别、人体姿态检测、人体轮廓调整、人体图像编辑、以及人体贴图。
此处,动作识别例如识别人体当前所作的动作,如打架、跑步等;人体姿态识别例如识别人体当前姿态,如卧倒、是否作出指定动作等;人体轮廓调整例如对人体的体型、身高进行调整等;人体图像编辑例如对人体进行缩放、旋转、剪裁等;人体贴图例如将图像A中的人体检测出来后,将对应人体图像粘贴至图像B中。
本公开实施例能够从待检测图像中,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息,并基于骨骼关键点的位置信息、以及轮廓关键点的位置信息,生成人体检测结果,在提升表征精细度的同时,兼顾计算数据量。
另外,本公开实施方式中,由于是采用表征人体骨骼结构的骨骼关键点的位置信息,和表征人体轮廓的轮廓关键点的位置信息来得到人体检测结果,表征人体的信息更加丰富,具有更广阔的应用场景,如图像编辑、人体体型调整等。
基于同一技术构思,本公开实施例中还提供了与人体检测方法对应的人体检测装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述人体检测方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。
参照图15所示,为本公开实施例提供的一种人体检测装置的示意图,所述装置包括:获取模块151、检测模块152、生成模块153;其中,获取模块151,用于获取待检测图像;检测模块152,用于基于所述待检测图像,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息;生成模块153,用于基于所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息,生成人体检测结果。
一种可能的实施方式中,所述轮廓关键点包括主轮廓关键点和辅助轮廓关键点;其中,两个相邻的所述主轮廓关键点之间存在至少一个辅助轮廓关键点。
一种可能的实施方式中,所述检测模块152,用于采用下述方式基于所述待检测图像,确定用于表征人体轮廓的轮廓关键点的位置信息:基于所述待检测图像,确定所 述主轮廓关键点的位置信息;基于所述主轮廓关键点的位置信息,确定人体轮廓信息;基于确定的所述人体轮廓信息,确定多个所述辅助轮廓关键点的位置信息果。
一种可能的实施方式中,所述人体检测结果包括下述一种或者多种:添加有骨骼关键点标记、以及轮廓关键点标记的待检测图像;包括所述骨骼关键点的位置信息以及所述轮廓关键点的位置信息的数据组。
一种可能的实施方式中,该人体检测装置还包括:执行模块154,用于基于所述人体检测结果,执行下述操作中一种或者多种:人体动作识别、人体姿态检测、人体轮廓调整、人体图像编辑、以及人体贴图。
一种可能的实施方式中,所述检测模块152,用于采用下述方式基于所述待检测图像,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息:基于所述待检测图像,进行特征提取以获得骨骼特征及轮廓特征,并将得到的骨骼特征和轮廓特征进行特征融合;基于特征融合结果,确定所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息。
一种可能的实施方式中,所述检测模块152,用于采用下述方式基于所述待检测图像进行特征提取以获得骨骼特征及轮廓特征,并将得到的骨骼特征和轮廓特征进行特征融合:基于所述待检测图像,进行至少一次特征提取,并将每次特征提取得到的骨骼特征以及轮廓特征进行特征融合,其中,在进行多次特征提取的情况下,基于第i次特征融合的特征融合结果进行第i+1次特征提取,i为正整数;所述检测模块152,用于采用下述方式基于特征融合结果,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息:基于最后一次特征融合的特征融合结果,确定所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息。
一种可能的实施方式中,所述检测模块152,用于采用下述方式基于所述待检测图像,进行至少一次特征提取:在第一次特征提取中,使用预先训练的第一特征提取网络从待检测图像中提取用于表征人体骨骼特征的骨骼关键点的第一目标骨骼特征矩阵,以及用于表征人体轮廓特征的轮廓关键点的第一目标轮廓特征矩阵;在第i+1次特征提取中,使用预先训练的第二特征提取网络从第i次特征融合的特征融合结果中,提取用于表征人体骨骼特征的骨骼关键点的第一目标骨骼特征矩阵;并提取用于表征人体轮廓特征的轮廓关键点的第一目标轮廓特征矩阵;其中,第一特征提取网络和第二特征提取网络的网络参数不同,且不同次的特征提取使用的第二特征提取网络的网络参数不同。
一种可能的实施方式中,所述检测模块152,用于采用下述方式将提取得到的骨骼特征和轮廓特征进行特征融合:使用预先训练的特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵;其中,所述第二目标骨骼特征矩阵为三维骨骼特征矩阵,该三维骨骼特征矩阵包括与各个骨骼关键点分别对应的二维骨骼特征矩阵;所述二维骨骼特征矩阵中每个元素的值,表征与该元素对应的像素点属于对应骨骼关键点的概率;所述第二目标轮廓特征矩阵为三维轮廓特征矩阵,该三维轮廓特征矩阵包括与各个轮廓关键点分别对应的二维轮廓特征矩阵;所述二维轮廓特征矩阵中每个元素的值,表征与该元素对应的像素点属于对应轮廓关键点的概率;不同次特征融合使用的特征融合神经 网络的网络参数不同。
一种可能的实施方式中,所述检测模块152,用于采用下述方式基于最后一次特征融合的特征融合结果,确定所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息:基于最后一次特征融合得到的第二目标骨骼特征矩阵,确定所述骨骼关键点的位置信息;以及基于最后一次特征融合得到的第二目标轮廓特征矩阵,确定所述轮廓关键点的位置信息。
一种可能的实施方式中,第一特征提取网络包括:共有特征提取网络、第一骨骼特征提取网络以及第一轮廓特征提取网络;所述检测模块152,用于采用下述方式使用第一特征提取网络从待检测图像中提取用于表征人体骨骼特征的骨骼关键点的第一目标骨骼特征矩阵;并提取用于表征人体轮廓特征的轮廓关键点的第一目标轮廓特征矩阵:
使用所述共有特征提取网络对所述待检测图像进行卷积处理,得到包含骨骼特征以及轮廓特征的基础特征矩阵;使用所述第一骨骼特征提取网络对所述基础特征矩阵进行卷积处理,得到第一骨骼特征矩阵,并从所述第一骨骼特征提取网络中的第一目标卷积层获取第二骨骼特征矩阵;基于所述第一骨骼特征矩阵以及所述第二骨骼特征矩阵,得到所述第一目标骨骼特征矩阵;所述第一目标卷积层为所述第一骨骼特征提取网络中,除最后一层卷积层外的其他任一卷积层;使用所述第一轮廓特征提取网络,对所述基础特征矩阵进行卷积处理,得到第一轮廓特征矩阵,并从所述第一轮廓特征提取网络中的第二目标卷积层获取第二轮廓特征矩阵;基于所述第一轮廓特征矩阵以及所述第二轮廓特征矩阵,得到所述第一目标轮廓特征矩阵;所述第二目标卷积层为所述第一轮廓特征提取网络中,除最后一层卷积层外的其他任一卷积层。
一种可能的实施方式中,所述检测模块152,用于采用下述方式基于所述第一骨骼特征矩阵以及所述第二骨骼特征矩阵,得到所述第一目标骨骼特征矩阵:将所述第一骨骼特征矩阵以及所述第二骨骼特征矩阵进行拼接处理,得到第一拼接骨骼特征矩阵;对所述第一拼接骨骼特征矩阵进行维度变换处理,得到所述第一目标骨骼特征矩阵;所述基于所述第一轮廓特征矩阵以及所述第二轮廓特征矩阵,得到所述第一目标轮廓特征矩阵,包括:将所述第一轮廓特征矩阵以及所述第二轮廓特征矩阵进行拼接处理,得到第一拼接轮廓特征矩阵;对所述第一拼接轮廓特征矩阵进行维度变换处理,得到所述第一目标轮廓特征矩阵;其中,所述第一目标骨骼特征矩阵的维度与所述第一目标轮廓特征矩阵的维度相同、且所述第一目标骨骼特征矩阵与所述第一目标轮廓特征矩阵在相同维度上的维数相同。
一种可能的实施方式中,所述特征融合神经网络包括:第一卷积神经网络、第二卷积神经网络、第一变换神经网络、以及第二变换神经网络;
所述检测模块152,用于采用下述方式使用特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵:使用所述第一卷积神经网络对所述第一目标骨骼特征矩阵进行卷积处理,得到第一中间骨骼特征矩阵;以及使用所述第二卷积神经网络对所述第一 目标轮廓特征矩阵进行卷积处理,得到第一中间轮廓特征矩阵;将所述第一中间轮廓特征矩阵与所述第一目标骨骼特征矩阵进行拼接处理,得到第一拼接特征矩阵;并使用所述第一变换神经网络对所述第一拼接特征矩阵进行维度变换,得到所述第二目标骨骼特征矩阵;将所述第一中间骨骼特征矩阵与所述第一目标轮廓特征矩阵进行拼接处理,得到第二拼接特征矩阵,并使用所述第二变换神经网络对所述第二拼接特征矩阵进行维度变换,得到所述第二目标轮廓特征矩阵。
一种可能的实施方式中,所述特征融合神经网络包括:第一定向卷积神经网络、第二定向卷积神经网络、第三卷积神经网络、第四卷积神经网络、第三变换神经网络、以及第四变换神经网络;
所述检测模块152,用于采用下述方式使用特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵:使用所述第一定向卷积神经网络对所述第一目标骨骼特征矩阵进行定向卷积处理,得到第一定向骨骼特征矩阵;并使用第三卷积神经网络对所述第一定向骨骼特征矩阵进行卷积处理,得到第二中间骨骼特征矩阵;以及使用所述第二定向卷积神经网络对所述第一目标轮廓特征矩阵进行定向卷积处理,得到第一定向轮廓特征矩阵;并使用第四卷积神经网络对所述第一定向轮廓特征矩阵进行卷积处理,得到第二中间轮廓特征矩阵;将所述第二中间轮廓特征矩阵与所述第一目标骨骼特征矩阵进行拼接处理,得到第三拼接特征矩阵;并使用第三变换神经网络对所述第三拼接特征矩阵进行维度变换,得到所述第二目标骨骼特征矩阵;将所述第二中间骨骼特征矩阵与所述第一目标轮廓特征矩阵进行拼接处理,得到第四拼接特征矩阵,并使用第四变换神经网络对所述第四拼接特征矩阵进行维度变换,得到所述第二目标轮廓特征矩阵。
一种可能的实施方式中,所述特征融合神经网络包括:位移估计神经网络、第五变换神经网络;
所述检测模块152,用于采用下述方式使用特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵:对所述第一目标骨骼特征矩阵和所述第一目标轮廓特征矩阵进行拼接处理,得到第五拼接特征矩阵;所述第五拼接特征矩阵输入至所述位移估计神经网络中,对预先确定的多组关键点对进行位移估计,得到每组关键点对中的一个关键点移动至另一关键点的位移信息;将每组关键点对中的每个关键点分别作为当前关键点,从与该当前关键点配对的另一关键点对应的三维特征矩阵中,获取与所述配对的另一关键点对应的二维特征矩阵;根据从所述配对的另一关键点到所述当前关键点的位移信息,对所述配对的另一关键点对应的二维特征矩阵中的元素进行位置变换,得到与所述当前关键点对应的位移特征矩阵;针对每个骨骼关键点,将该骨骼关键点对应的二维特征矩阵,与该骨骼关键点对应的各个位移特征矩阵进行拼接处理,得到该骨骼关键点的拼接二维特征矩阵;并将该骨骼关键点的拼接二维特征矩阵输入至所述第五变换神经网络,得到与该骨骼关键点对应的目标二维特征矩阵;基于各个骨骼关键点分别对应的目标二维特征矩阵,生成所述第二目标骨骼特征矩阵;针对每个轮廓关键点,将该轮廓关键点对应的二维特征矩阵,与该骨骼关键点对应的各个位移特征矩阵进行拼接处理,得到该轮廓关键点的拼接二维特征矩阵;并将该轮廓关键点的拼接二维特征矩阵输入至所述第 五变换神经网络,得到与该轮廓关键点对应的目标二维特征矩阵;基于各个轮廓关键点分别对应的目标二维特征矩阵,生成所述第二目标轮廓特征矩阵。
一种可能的实施方式中,所述人体检测方法通过人体检测模型实现;所述人体检测模型包括:所述第一特征提取网络和/或所述特征融合神经网络;所述人体检测模型为利用训练样本集中的样本图像训练得到的,所述样本图像标注有人体骨骼结构的骨骼关键点的实际位置信息、以及人体轮廓的轮廓关键点的实际位置信息。
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。
本公开实施例还提供了一种计算机设备,如图16所示,为本公开实施例提供的计算机设备结构示意图,包括:
处理器11、存储介质12、和总线13;存储介质12用于存储执行指令,包括内存121和外部存储器122;这里的内存121也称内存储器,用于暂时存放处理器11中的处理数据,以及与硬盘等外部存储器122交换的数据,处理器11通过内存121与外部存储器122进行数据交换,当所述计算机设备100运行的情况下,所述处理器11与所述存储介质12之间通过总线13通信,使得所述处理器11在执行以下指令:获取待检测图像;基于所述待检测图像,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息;基于所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息,生成人体检测结果。
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行的情况下执行上述方法实施例中所述的人体检测方法的步骤。
本公开实施例所提供的人体检测方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行上述方法实施例中所述的人体检测方法的步骤,具体可参见上述方法实施例,在此不再赘述。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现的情况下可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可 以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用的情况下,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。

Claims (32)

  1. 一种人体检测方法,其特征在于,包括:
    获取待检测图像;
    基于所述待检测图像,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息;
    基于所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息,生成人体检测结果。
  2. 根据权利要求1所述的人体检测方法,其特征在于,所述轮廓关键点包括主轮廓关键点和辅助轮廓关键点;其中,两个相邻的所述主轮廓关键点之间存在至少一个所述辅助轮廓关键点。
  3. 根据权利要求2所述的人体检测方法,其特征在于,基于所述待检测图像,确定用于表征人体轮廓的轮廓关键点的位置信息,包括:
    基于所述待检测图像,确定所述主轮廓关键点的位置信息;
    基于所述主轮廓关键点的位置信息,确定人体轮廓信息;
    基于确定的所述人体轮廓信息,确定多个所述辅助轮廓关键点的位置信息。
  4. 根据权利要求1至3任一所述的人体检测方法,其特征在于,所述人体检测结果包括下述一种或者多种:
    添加有骨骼关键点标记、以及轮廓关键点标记的所述待检测图像;
    包括所述骨骼关键点的位置信息以及所述轮廓关键点的位置信息的数据组。
  5. 根据权利要求4所述的人体检测方法,其特征在于,该方法还包括:
    基于所述人体检测结果,执行下述操作中一种或者多种:人体动作识别、人体姿态检测、人体轮廓调整、人体图像编辑、以及人体贴图。
  6. 根据权利要求1至5任一所述的人体检测方法,其特征在于,所述基于所述待检测图像,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息,包括:
    基于所述待检测图像,进行特征提取以获得骨骼特征及轮廓特征,并将得到的骨骼特征和轮廓特征进行特征融合;基于特征融合结果,确定所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息。
  7. 根据权利要求6所述的人体检测方法,其特征在于,所述基于所述待检测图像,进行特征提取以获得骨骼特征及轮廓特征,并将得到的骨骼特征和轮廓特征进行特征融合,包括:
    基于所述待检测图像,进行至少一次特征提取,并将每次特征提取得到的骨骼特征以及轮廓特征进行特征融合,其中,在进行多次特征提取的情况下,基于第i次特征融合的特征融合结果进行第i+1次特征提取,i为正整数;
    所述基于特征融合结果,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息,包括:
    基于最后一次特征融合的特征融合结果,确定所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息。
  8. 根据权利要求7所述的人体检测方法,其特征在于,所述基于所述待检测图像,进行至少一次特征提取,包括:
    在第一次特征提取中,使用预先训练的第一特征提取网络从所述待检测图像中提取用于表征人体骨骼特征的骨骼关键点的第一目标骨骼特征矩阵,以及用于表征人体轮廓特征的轮廓关键点的第一目标轮廓特征矩阵;
    在第i+1次特征提取中,使用预先训练的第二特征提取网络从第i次特征融合的特征融合结果中,提取所述第一目标骨骼特征矩阵以及所述第一目标轮廓特征矩阵;
    其中,第一特征提取网络和第二特征提取网络的网络参数不同,且不同次的特征提取使用的第二特征提取网络的网络参数不同。
  9. 根据权利要求8所述的人体检测方法,其特征在于,将提取得到的骨骼特征和轮廓特征进行特征融合,包括:
    使用预先训练的特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵;
    其中,所述第二目标骨骼特征矩阵为三维骨骼特征矩阵,该三维骨骼特征矩阵包括与各个骨骼关键点分别对应的二维骨骼特征矩阵;所述二维骨骼特征矩阵中每个元素的值,表征与该元素对应的像素点属于对应骨骼关键点的概率;
    所述第二目标轮廓特征矩阵为三维轮廓特征矩阵,该三维轮廓特征矩阵包括与各个轮廓关键点分别对应的二维轮廓特征矩阵;所述二维轮廓特征矩阵中每个元素的值,表征与该元素对应的像素点属于对应轮廓关键点的概率;
    不同次特征融合使用的特征融合神经网络的网络参数不同。
  10. 根据权利要求8所述的人体检测方法,其特征在于,所述第一特征提取网络包括:共有特征提取网络、第一骨骼特征提取网络以及第一轮廓特征提取网络;
    使用第一特征提取网络从所述待检测图像中提取所述第一目标骨骼特征矩阵以及所述第一目标轮廓特征矩阵,包括:
    使用所述共有特征提取网络对所述待检测图像进行卷积处理,得到包含骨骼特征以及轮廓特征的基础特征矩阵;
    使用所述第一骨骼特征提取网络对所述基础特征矩阵进行卷积处理,得到第一骨骼特征矩阵,并从所述第一骨骼特征提取网络中的第一目标卷积层获取第二骨骼特征矩阵;基于所述第一骨骼特征矩阵以及所述第二骨骼特征矩阵,得到所述第一目标骨骼特征矩阵;所述第一目标卷积层为所述第一骨骼特征提取网络中,除最后一层卷积层外的其他任一卷积层;
    使用所述第一轮廓特征提取网络,对所述基础特征矩阵进行卷积处理,得到第一轮廓特征矩阵,并从所述第一轮廓特征提取网络中的第二目标卷积层获取第二轮廓特征矩阵;基于所述第一轮廓特征矩阵以及所述第二轮廓特征矩阵,得到所述第一目标轮廓特征矩阵;所述第二目标卷积层为所述第一轮廓特征提取网络中,除最后一层卷积层外的其他任一卷积层。
  11. 根据权利要求10所述的人体检测方法,其特征在于,所述基于所述第一骨骼特征矩阵以及所述第二骨骼特征矩阵,得到所述第一目标骨骼特征矩阵,包括:
    将所述第一骨骼特征矩阵以及所述第二骨骼特征矩阵进行拼接处理,得到第一拼接骨骼特征矩阵;对所述第一拼接骨骼特征矩阵进行维度变换处理,得到所述第一目标骨骼特征矩阵;
    所述基于所述第一轮廓特征矩阵以及所述第二轮廓特征矩阵,得到所述第一目标轮廓特征矩阵,包括:
    将所述第一轮廓特征矩阵以及所述第二轮廓特征矩阵进行拼接处理,得到第一拼接轮廓特征矩阵;对所述第一拼接轮廓特征矩阵进行维度变换处理,得到所述第一目标轮廓特征矩阵;
    其中,所述第一目标骨骼特征矩阵的维度与所述第一目标轮廓特征矩阵的维度相同、且所述第一目标骨骼特征矩阵与所述第一目标轮廓特征矩阵在相同维度上的维数相同。
  12. 根据权利要求9所述的人体检测方法,其特征在于,所述特征融合神经网络包括:第一卷积神经网络、第二卷积神经网络、第一变换神经网络、以及第二变换神经网络;
    所述使用特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵,包括:
    使用所述第一卷积神经网络对所述第一目标骨骼特征矩阵进行卷积处理,得到第一中间骨骼特征矩阵;以及使用所述第二卷积神经网络对所述第一目标轮廓特征矩阵进行 卷积处理,得到第一中间轮廓特征矩阵;
    将所述第一中间轮廓特征矩阵与所述第一目标骨骼特征矩阵进行拼接处理,得到第一拼接特征矩阵;并使用所述第一变换神经网络对所述第一拼接特征矩阵进行维度变换,得到所述第二目标骨骼特征矩阵;
    将所述第一中间骨骼特征矩阵与所述第一目标轮廓特征矩阵进行拼接处理,得到第二拼接特征矩阵,并使用所述第二变换神经网络对所述第二拼接特征矩阵进行维度变换,得到所述第二目标轮廓特征矩阵。
  13. 根据权利要求9所述的人体检测方法,其特征在于,所述特征融合神经网络包括:第一定向卷积神经网络、第二定向卷积神经网络、第三卷积神经网络、第四卷积神经网络、第三变换神经网络、以及第四变换神经网络;
    所述使用特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵,包括:
    使用所述第一定向卷积神经网络对所述第一目标骨骼特征矩阵进行定向卷积处理,得到第一定向骨骼特征矩阵;并使用第三卷积神经网络对所述第一定向骨骼特征矩阵进行卷积处理,得到第二中间骨骼特征矩阵;以及
    使用所述第二定向卷积神经网络对所述第一目标轮廓特征矩阵进行定向卷积处理,得到第一定向轮廓特征矩阵;并使用第四卷积神经网络对所述第一定向轮廓特征矩阵进行卷积处理,得到第二中间轮廓特征矩阵;
    将所述第二中间轮廓特征矩阵与所述第一目标骨骼特征矩阵进行拼接处理,得到第三拼接特征矩阵;并使用第三变换神经网络对所述第三拼接特征矩阵进行维度变换,得到所述第二目标骨骼特征矩阵;
    将所述第二中间骨骼特征矩阵与所述第一目标轮廓特征矩阵进行拼接处理,得到第四拼接特征矩阵,并使用第四变换神经网络对所述第四拼接特征矩阵进行维度变换,得到所述第二目标轮廓特征矩阵。
  14. 根据权利要求9所述的人体检测方法,其特征在于,所述特征融合神经网络包括:位移估计神经网络、第五变换神经网络;
    所述使用特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵,包括:
    对所述第一目标骨骼特征矩阵和所述第一目标轮廓特征矩阵进行拼接处理,得到第五拼接特征矩阵;
    将所述第五拼接特征矩阵输入至所述位移估计神经网络中,对预先确定的多组关键 点对进行位移估计,得到每组关键点对中的一个关键点移动至另一关键点的位移信息;将每组关键点对中的每个关键点分别作为当前关键点,从与该当前关键点配对的另一关键点对应的三维特征矩阵中,获取与所述配对的另一关键点对应的二维特征矩阵;
    根据从所述配对的另一关键点到所述当前关键点的位移信息,对所述配对的另一关键点对应的二维特征矩阵中的元素进行位置变换,得到与所述当前关键点对应的位移特征矩阵;
    针对每个骨骼关键点,将该骨骼关键点对应的二维特征矩阵,与该骨骼关键点对应的各个位移特征矩阵进行拼接处理,得到该骨骼关键点的拼接二维特征矩阵;并将该骨骼关键点的拼接二维特征矩阵输入至所述第五变换神经网络,得到与该骨骼关键点对应的目标二维特征矩阵;基于各个骨骼关键点分别对应的目标二维特征矩阵,生成所述第二目标骨骼特征矩阵;
    针对每个轮廓关键点,将该轮廓关键点对应的二维特征矩阵,与该轮廓关键点对应的各个位移特征矩阵进行拼接处理,得到该轮廓关键点的拼接二维特征矩阵;并将该轮廓关键点的拼接二维特征矩阵输入至所述第五变换神经网络,得到与该轮廓关键点对应的目标二维特征矩阵;基于各个轮廓关键点分别对应的目标二维特征矩阵,生成所述第二目标轮廓特征矩阵。
  15. 根据权利要求1至14任一所述的人体检测方法,其特征在于,所述人体检测方法通过人体检测模型实现;所述人体检测模型包括:第一特征提取网络和/或特征融合神经网络;
    所述人体检测模型为利用训练样本集中的样本图像训练得到的,所述样本图像标注有人体骨骼结构的骨骼关键点的实际位置信息、以及人体轮廓的轮廓关键点的实际位置信息。
  16. 一种人体检测装置,其特征在于,包括:
    获取模块,用于获取待检测图像;
    检测模块,用于基于所述待检测图像,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息;
    生成模块,用于基于所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息,生成人体检测结果。
  17. 根据权利要求16所述的人体检测装置,其特征在于,所述轮廓关键点包括主轮廓关键点和辅助轮廓关键点;其中,两个相邻的所述主轮廓关键点之间存在至少一个所述辅助轮廓关键点。
  18. 根据权利要求17所述的人体检测装置,其特征在于,所述检测模块,用于采用下述方式基于所述待检测图像,确定用于表征人体轮廓的轮廓关键点的位置信息:
    基于所述待检测图像,确定所述主轮廓关键点的位置信息;
    基于所述主轮廓关键点的位置信息,确定人体轮廓信息;
    基于确定的所述人体轮廓信息,确定多个所述辅助轮廓关键点的位置信息。
  19. 根据权利要求16至18任一所述的人体检测装置,其特征在于,所述人体检测结果包括下述一种或者多种:
    添加有骨骼关键点标记、以及轮廓关键点标记的所述待检测图像;
    包括所述骨骼关键点的位置信息以及所述轮廓关键点的位置信息的数据组。
  20. 根据权利要求19所述的人体检测装置,其特征在于,该人体检测装置还包括:
    执行模块,用于基于所述人体检测结果,执行下述操作中一种或者多种:人体动作识别、人体姿态检测、人体轮廓调整、人体图像编辑、以及人体贴图。
  21. 根据权利要求16至20任一所述的人体检测装置,其特征在于,所述检测模块,用于采用下述方式基于所述待检测图像,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息:
    基于所述待检测图像,进行特征提取以获得骨骼特征及轮廓特征,并将得到的骨骼特征和轮廓特征进行特征融合;基于特征融合结果,确定所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息。
  22. 根据权利要求21所述的人体检测装置,其特征在于,所述检测模块,用于采用下述方式基于所述待检测图像进行特征提取以获得骨骼特征及轮廓特征,并将得到的骨骼特征和轮廓特征进行特征融合:
    基于所述待检测图像,进行至少一次特征提取,并将每次特征提取得到的骨骼特征以及轮廓特征进行特征融合,其中,在进行多次特征提取的情况下,基于第i次特征融合的特征融合结果进行第i+1次特征提取,i为正整数;
    所述检测模块,用于采用下述方式基于特征融合结果,确定用于表征人体骨骼结构的骨骼关键点的位置信息、以及用于表征人体轮廓的轮廓关键点的位置信息:
    基于最后一次特征融合的特征融合结果,确定所述骨骼关键点的位置信息、以及所述轮廓关键点的位置信息。
  23. 根据权利要求22所述的人体检测装置,其特征在于,所述检测模块,用于采用下述方式基于所述待检测图像,进行至少一次特征提取:
    在第一次特征提取中,使用预先训练的第一特征提取网络从所述待检测图像中提取 用于表征人体骨骼特征的骨骼关键点的第一目标骨骼特征矩阵,以及用于表征人体轮廓特征的轮廓关键点的第一目标轮廓特征矩阵;
    在第i+1次特征提取中,使用预先训练的第二特征提取网络从第i次特征融合的特征融合结果中,提取所述第一目标骨骼特征矩阵以及所述第一目标轮廓特征矩阵;
    其中,第一特征提取网络和第二特征提取网络的网络参数不同,且不同次的特征提取使用的第二特征提取网络的网络参数不同。
  24. 根据权利要求23所述的人体检测装置,其特征在于,所述检测模块,用于采用下述方式将提取得到的骨骼特征和轮廓特征进行特征融合:
    使用预先训练的特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵;
    其中,所述第二目标骨骼特征矩阵为三维骨骼特征矩阵,该三维骨骼特征矩阵包括与各个骨骼关键点分别对应的二维骨骼特征矩阵;所述二维骨骼特征矩阵中每个元素的值,表征与该元素对应的像素点属于对应骨骼关键点的概率;
    所述第二目标轮廓特征矩阵为三维轮廓特征矩阵,该三维轮廓特征矩阵包括与各个轮廓关键点分别对应的二维轮廓特征矩阵;所述二维轮廓特征矩阵中每个元素的值,表征与该元素对应的像素点属于对应轮廓关键点的概率;
    不同次特征融合使用的特征融合神经网络的网络参数不同。
  25. 根据权利要求23所述的人体检测装置,其特征在于,第一特征提取网络包括:共有特征提取网络、第一骨骼特征提取网络以及第一轮廓特征提取网络;
    所述检测模块,用于采用下述方式使用第一特征提取网络从所述待检测图像中提取所述第一目标骨骼特征矩阵以及所述第一目标轮廓特征矩阵:
    使用所述共有特征提取网络对所述待检测图像进行卷积处理,得到包含骨骼特征以及轮廓特征的基础特征矩阵;
    使用所述第一骨骼特征提取网络对所述基础特征矩阵进行卷积处理,得到第一骨骼特征矩阵,并从所述第一骨骼特征提取网络中的第一目标卷积层获取第二骨骼特征矩阵;基于所述第一骨骼特征矩阵以及所述第二骨骼特征矩阵,得到所述第一目标骨骼特征矩阵;所述第一目标卷积层为所述第一骨骼特征提取网络中,除最后一层卷积层外的其他任一卷积层;
    使用所述第一轮廓特征提取网络,对所述基础特征矩阵进行卷积处理,得到第一轮廓特征矩阵,并从所述第一轮廓特征提取网络中的第二目标卷积层获取第二轮廓特征矩阵;基于所述第一轮廓特征矩阵以及所述第二轮廓特征矩阵,得到所述第一目标轮廓特 征矩阵;所述第二目标卷积层为所述第一轮廓特征提取网络中,除最后一层卷积层外的其他任一卷积层。
  26. 根据权利要求25所述的人体检测装置,其特征在于,所述检测模块,用于采用下述方式基于所述第一骨骼特征矩阵以及所述第二骨骼特征矩阵,得到所述第一目标骨骼特征矩阵:
    将所述第一骨骼特征矩阵以及所述第二骨骼特征矩阵进行拼接处理,得到第一拼接骨骼特征矩阵;
    对所述第一拼接骨骼特征矩阵进行维度变换处理,得到所述第一目标骨骼特征矩阵;
    所述基于所述第一轮廓特征矩阵以及所述第二轮廓特征矩阵,得到所述第一目标轮廓特征矩阵,包括:
    将所述第一轮廓特征矩阵以及所述第二轮廓特征矩阵进行拼接处理,得到第一拼接轮廓特征矩阵;
    对所述第一拼接轮廓特征矩阵进行维度变换处理,得到所述第一目标轮廓特征矩阵;
    其中,所述第一目标骨骼特征矩阵的维度与所述第一目标轮廓特征矩阵的维度相同、且所述第一目标骨骼特征矩阵与所述第一目标轮廓特征矩阵在相同维度上的维数相同。
  27. 根据权利要求24所述的人体检测装置,其特征在于,所述特征融合神经网络包括:第一卷积神经网络、第二卷积神经网络、第一变换神经网络、以及第二变换神经网络;
    所述检测模块,用于采用下述方式使用特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵:
    使用所述第一卷积神经网络对所述第一目标骨骼特征矩阵进行卷积处理,得到第一中间骨骼特征矩阵;以及使用所述第二卷积神经网络对所述第一目标轮廓特征矩阵进行卷积处理,得到第一中间轮廓特征矩阵;
    将所述第一中间轮廓特征矩阵与所述第一目标骨骼特征矩阵进行拼接处理,得到第一拼接特征矩阵;并使用所述第一变换神经网络对所述第一拼接特征矩阵进行维度变换,得到所述第二目标骨骼特征矩阵;
    将所述第一中间骨骼特征矩阵与所述第一目标轮廓特征矩阵进行拼接处理,得到第二拼接特征矩阵,并使用所述第二变换神经网络对所述第二拼接特征矩阵进行维度变换,得到所述第二目标轮廓特征矩阵。
  28. 根据权利要求24所述的人体检测装置,其特征在于,所述特征融合神经网络 包括:第一定向卷积神经网络、第二定向卷积神经网络、第三卷积神经网络、第四卷积神经网络、第三变换神经网络、以及第四变换神经网络;
    所述检测模块,用于采用下述方式使用特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵:
    使用所述第一定向卷积神经网络对所述第一目标骨骼特征矩阵进行定向卷积处理,得到第一定向骨骼特征矩阵;并使用第三卷积神经网络对所述第一定向骨骼特征矩阵进行卷积处理,得到第二中间骨骼特征矩阵;以及
    使用所述第二定向卷积神经网络对所述第一目标轮廓特征矩阵进行定向卷积处理,得到第一定向轮廓特征矩阵;并使用第四卷积神经网络对所述第一定向轮廓特征矩阵进行卷积处理,得到第二中间轮廓特征矩阵;
    将所述第二中间轮廓特征矩阵与所述第一目标骨骼特征矩阵进行拼接处理,得到第三拼接特征矩阵;并使用第三变换神经网络对所述第三拼接特征矩阵进行维度变换,得到所述第二目标骨骼特征矩阵;
    将所述第二中间骨骼特征矩阵与所述第一目标轮廓特征矩阵进行拼接处理,得到第四拼接特征矩阵,并使用第四变换神经网络对所述第四拼接特征矩阵进行维度变换,得到所述第二目标轮廓特征矩阵。
  29. 根据权利要求24所述的人体检测装置,其特征在于,所述特征融合神经网络包括:位移估计神经网络、第五变换神经网络;
    所述检测模块,用于采用下述方式使用特征融合神经网络对所述第一目标骨骼特征矩阵、以及所述第一目标轮廓特征矩阵进行特征融合,得到第二目标骨骼特征矩阵和第二目标轮廓特征矩阵:
    对所述第一目标骨骼特征矩阵和所述第一目标轮廓特征矩阵进行拼接处理,得到第五拼接特征矩阵;
    所述第五拼接特征矩阵输入至所述位移估计神经网络中,对预先确定的多组关键点对进行位移估计,得到每组关键点对中的一个关键点移动至另一关键点的位移信息;将每组关键点对中的每个关键点分别作为当前关键点,从与该当前关键点配对的另一关键点对应的三维特征矩阵中,获取与所述配对的另一关键点对应的二维特征矩阵;
    根据从所述配对的另一关键点到所述当前关键点的位移信息,对所述配对的另一关键点对应的二维特征矩阵中的元素进行位置变换,得到与所述当前关键点对应的位移特征矩阵;
    针对每个骨骼关键点,将该骨骼关键点对应的二维特征矩阵,与该骨骼关键点对应的各个位移特征矩阵进行拼接处理,得到该骨骼关键点的拼接二维特征矩阵;并将该骨骼关键点的拼接二维特征矩阵输入至所述第五变换神经网络,得到与该骨骼关键点对应的目标二维特征矩阵;基于各个骨骼关键点分别对应的目标二维特征矩阵,生成所述第二目标骨骼特征矩阵;
    针对每个轮廓关键点,将该轮廓关键点对应的二维特征矩阵,与该骨骼关键点对应的各个位移特征矩阵进行拼接处理,得到该轮廓关键点的拼接二维特征矩阵;并将该轮廓关键点的拼接二维特征矩阵输入至所述第五变换神经网络,得到与该轮廓关键点对应的目标二维特征矩阵;基于各个轮廓关键点分别对应的目标二维特征矩阵,生成所述第二目标轮廓特征矩阵。
  30. 根据权利要求16~29任一所述的人体检测装置,其特征在于,所述人体检测装置的人体检测功能通过人体检测模型实现;所述人体检测模型包括:第一特征提取网络和/或特征融合神经网络;
    所述人体检测模型为利用训练样本集中的样本图像训练得到的,所述样本图像标注有人体骨骼结构的骨骼关键点的实际位置信息、以及人体轮廓的轮廓关键点的实际位置信息。
  31. 一种计算机设备,其特征在于,包括:处理器、非暂时性存储介质和总线,所述非暂时性存储介质存储有所述处理器可执行的机器可读指令,当计算机设备运行的情况下,所述处理器与所述非暂时性存储介质之间通过总线通信,所述机器可读指令被所述处理器执行以执行如权利要求1至15任一所述方法的步骤。
  32. 一种计算机可读存储介质,其特征在于,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行以执行如权利要求1至15任一所述方法的步骤。
PCT/CN2020/087826 2019-09-27 2020-04-29 人体检测方法、装置、计算机设备及存储介质 WO2021057027A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
AU2020335016A AU2020335016A1 (en) 2019-09-27 2020-04-29 Human detection method and apparatus, computer device and storage medium
KR1020207037358A KR20210038436A (ko) 2019-09-27 2020-04-29 인체 검출 방법, 장치, 컴퓨터 기기 및 저장 매체
JP2020572391A JP7101829B2 (ja) 2019-09-27 2020-04-29 人体検出方法、装置、コンピュータ機器及び記憶媒体
SG11202101794SA SG11202101794SA (en) 2019-09-27 2020-04-29 Human detection method and apparatus, computer device and storage medium
EP20853555.9A EP3828765A4 (en) 2019-09-27 2020-04-29 METHOD AND DEVICE FOR DETECTING HUMAN BODY, COMPUTER DEVICE AND STORAGE MEDIUM
US17/181,376 US20210174074A1 (en) 2019-09-27 2021-02-22 Human detection method and apparatus, computer device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910926373.4 2019-09-27
CN201910926373.4A CN110705448B (zh) 2019-09-27 2019-09-27 一种人体检测方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/181,376 Continuation US20210174074A1 (en) 2019-09-27 2021-02-22 Human detection method and apparatus, computer device and storage medium

Publications (1)

Publication Number Publication Date
WO2021057027A1 true WO2021057027A1 (zh) 2021-04-01

Family

ID=69196895

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/087826 WO2021057027A1 (zh) 2019-09-27 2020-04-29 人体检测方法、装置、计算机设备及存储介质

Country Status (9)

Country Link
US (1) US20210174074A1 (zh)
EP (1) EP3828765A4 (zh)
JP (1) JP7101829B2 (zh)
KR (1) KR20210038436A (zh)
CN (1) CN110705448B (zh)
AU (1) AU2020335016A1 (zh)
SG (1) SG11202101794SA (zh)
TW (1) TWI742690B (zh)
WO (1) WO2021057027A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469018A (zh) * 2021-06-29 2021-10-01 中北大学 基于rgb与三维骨骼的多模态交互行为识别方法
CN113743257A (zh) * 2021-08-20 2021-12-03 江苏大学 一种融合时空特征的施工高空作业失稳状态检测方法
CN114519666A (zh) * 2022-02-18 2022-05-20 广州方硅信息技术有限公司 直播图像矫正方法、装置、设备及存储介质
CN115019386A (zh) * 2022-04-15 2022-09-06 北京航空航天大学 基于深度学习的运动辅助训练方法
CN117315791A (zh) * 2023-11-28 2023-12-29 杭州华橙软件技术有限公司 骨骼动作识别方法、设备及存储介质

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705448B (zh) * 2019-09-27 2023-01-20 北京市商汤科技开发有限公司 一种人体检测方法及装置
CN111291793B (zh) * 2020-01-20 2023-11-14 北京大学口腔医学院 一种网格曲面的元素分类方法、装置及存储介质
CN111476291B (zh) * 2020-04-03 2023-07-25 南京星火技术有限公司 数据处理方法,装置及存储介质
CN111640197A (zh) * 2020-06-09 2020-09-08 上海商汤智能科技有限公司 一种增强现实ar特效控制方法、装置及设备
CN113469221A (zh) * 2021-06-09 2021-10-01 浙江大华技术股份有限公司 身份识别模型的训练方法和身份识别方法以及相关设备
CN113486751B (zh) * 2021-06-29 2023-07-04 西北大学 一种基于图卷积和边缘权重注意力的行人特征提取方法
CN113837306B (zh) * 2021-09-29 2024-04-12 南京邮电大学 一种基于人体关键点时空图模型的异常行为检测方法
CN114299288A (zh) * 2021-12-23 2022-04-08 广州方硅信息技术有限公司 图像分割方法、装置、设备和存储介质
CN115050101B (zh) * 2022-07-18 2024-03-22 四川大学 一种基于骨骼和轮廓特征融合的步态识别方法
CN115273154B (zh) * 2022-09-26 2023-01-17 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 基于边缘重构的热红外行人检测方法、系统及存储介质
CN115661138B (zh) * 2022-12-13 2023-03-21 北京大学第三医院(北京大学第三临床医学院) 基于dr影像的人体骨骼轮廓检测方法
CN116137074A (zh) * 2023-02-22 2023-05-19 常熟理工学院 电梯轿厢内乘客打斗行为的自动检测方法和系统
CN116434335B (zh) * 2023-03-30 2024-04-30 东莞理工学院 动作序列识别和意图推断方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831380A (zh) * 2011-06-15 2012-12-19 康佳集团股份有限公司 一种基于深度图像感应的肢体动作识别方法及系统
US20120327194A1 (en) * 2011-06-21 2012-12-27 Takaaki Shiratori Motion capture from body mounted cameras
CN103679175A (zh) * 2013-12-13 2014-03-26 电子科技大学 一种基于深度摄像机的快速3d骨骼模型检测方法
CN110084161A (zh) * 2019-04-17 2019-08-02 中山大学 一种人体骨骼关键点的快速检测方法及系统
CN110197117A (zh) * 2019-04-18 2019-09-03 北京奇艺世纪科技有限公司 人体轮廓点提取方法、装置、终端设备及计算机可读存储介质
CN110705448A (zh) * 2019-09-27 2020-01-17 北京市商汤科技开发有限公司 一种人体检测方法及装置

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4728795B2 (ja) * 2005-12-15 2011-07-20 日本放送協会 人物オブジェクト判定装置及び人物オブジェクト判定プログラム
US8428311B2 (en) * 2009-02-25 2013-04-23 Honda Motor Co., Ltd. Capturing and recognizing hand postures using inner distance shape contexts
EP2674913B1 (en) * 2012-06-14 2014-07-23 Softkinetic Software Three-dimensional object modelling fitting & tracking.
JP2014089665A (ja) * 2012-10-31 2014-05-15 Toshiba Corp 画像処理装置、画像処理方法、及び画像処理プログラム
CN103955680B (zh) * 2014-05-20 2017-05-31 深圳市赛为智能股份有限公司 基于形状上下文的动作识别方法及装置
CN104537608A (zh) * 2014-12-31 2015-04-22 深圳市中兴移动通信有限公司 一种图像处理的方法及其装置
CN105550678B (zh) * 2016-02-03 2019-01-18 武汉大学 基于全局显著边缘区域的人体动作特征提取方法
CN108229468B (zh) * 2017-06-28 2020-02-21 北京市商汤科技开发有限公司 车辆外观特征识别及车辆检索方法、装置、存储介质、电子设备
CN107705355A (zh) * 2017-09-08 2018-02-16 郭睿 一种基于多张图片的3d人体建模方法及装置
CN108229308A (zh) * 2017-11-23 2018-06-29 北京市商汤科技开发有限公司 目标对象识别方法、装置、存储介质和电子设备
CN108038469B (zh) * 2017-12-27 2019-10-25 百度在线网络技术(北京)有限公司 用于检测人体的方法和装置
CN110059522B (zh) * 2018-01-19 2021-06-25 北京市商汤科技开发有限公司 人体轮廓关键点检测方法、图像处理方法、装置及设备
CN109508625A (zh) * 2018-09-07 2019-03-22 咪咕文化科技有限公司 一种情感数据的分析方法及装置
CN109242868B (zh) * 2018-09-17 2021-05-04 北京旷视科技有限公司 图像处理方法、装置、电子设备及储存介质
US11335027B2 (en) * 2018-09-28 2022-05-17 Hewlett-Packard Development Company, L.P. Generating spatial gradient maps for a person in an image
CN109255783B (zh) * 2018-10-19 2020-09-25 上海摩象网络科技有限公司 一种多人图像上的人体骨骼关键点的位置排布检测方法
CN109902659B (zh) * 2019-03-15 2021-08-20 北京字节跳动网络技术有限公司 用于处理人体图像的方法和装置
CN110111418B (zh) * 2019-05-15 2022-02-25 北京市商汤科技开发有限公司 创建脸部模型的方法、装置及电子设备
CN110135375B (zh) * 2019-05-20 2021-06-01 中国科学院宁波材料技术与工程研究所 基于全局信息整合的多人姿态估计方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831380A (zh) * 2011-06-15 2012-12-19 康佳集团股份有限公司 一种基于深度图像感应的肢体动作识别方法及系统
US20120327194A1 (en) * 2011-06-21 2012-12-27 Takaaki Shiratori Motion capture from body mounted cameras
CN103679175A (zh) * 2013-12-13 2014-03-26 电子科技大学 一种基于深度摄像机的快速3d骨骼模型检测方法
CN110084161A (zh) * 2019-04-17 2019-08-02 中山大学 一种人体骨骼关键点的快速检测方法及系统
CN110197117A (zh) * 2019-04-18 2019-09-03 北京奇艺世纪科技有限公司 人体轮廓点提取方法、装置、终端设备及计算机可读存储介质
CN110705448A (zh) * 2019-09-27 2020-01-17 北京市商汤科技开发有限公司 一种人体检测方法及装置

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469018A (zh) * 2021-06-29 2021-10-01 中北大学 基于rgb与三维骨骼的多模态交互行为识别方法
CN113469018B (zh) * 2021-06-29 2024-02-23 中北大学 基于rgb与三维骨骼的多模态交互行为识别方法
CN113743257A (zh) * 2021-08-20 2021-12-03 江苏大学 一种融合时空特征的施工高空作业失稳状态检测方法
CN113743257B (zh) * 2021-08-20 2024-05-14 江苏大学 一种融合时空特征的施工高空作业失稳状态检测方法
CN114519666A (zh) * 2022-02-18 2022-05-20 广州方硅信息技术有限公司 直播图像矫正方法、装置、设备及存储介质
CN114519666B (zh) * 2022-02-18 2023-09-19 广州方硅信息技术有限公司 直播图像矫正方法、装置、设备及存储介质
CN115019386A (zh) * 2022-04-15 2022-09-06 北京航空航天大学 基于深度学习的运动辅助训练方法
CN117315791A (zh) * 2023-11-28 2023-12-29 杭州华橙软件技术有限公司 骨骼动作识别方法、设备及存储介质
CN117315791B (zh) * 2023-11-28 2024-02-20 杭州华橙软件技术有限公司 骨骼动作识别方法、设备及存储介质

Also Published As

Publication number Publication date
US20210174074A1 (en) 2021-06-10
TWI742690B (zh) 2021-10-11
EP3828765A1 (en) 2021-06-02
CN110705448A (zh) 2020-01-17
JP7101829B2 (ja) 2022-07-15
SG11202101794SA (en) 2021-04-29
CN110705448B (zh) 2023-01-20
TW202112306A (zh) 2021-04-01
JP2022503426A (ja) 2022-01-12
KR20210038436A (ko) 2021-04-07
AU2020335016A1 (en) 2021-04-15
EP3828765A4 (en) 2021-12-08

Similar Documents

Publication Publication Date Title
WO2021057027A1 (zh) 人体检测方法、装置、计算机设备及存储介质
CN108596974B (zh) 动态场景机器人定位建图系统及方法
EP3971841A1 (en) Three-dimensional model generation method and apparatus, and computer device and storage medium
CN110832501A (zh) 用于姿态不变面部对准的系统和方法
US11417095B2 (en) Image recognition method and apparatus, electronic device, and readable storage medium using an update on body extraction parameter and alignment parameter
CN110399789B (zh) 行人重识别方法、模型构建方法、装置、设备和存储介质
CN108921926A (zh) 一种基于单张图像的端到端三维人脸重建方法
CN110288614A (zh) 图像处理方法、装置、设备及存储介质
CN109325995B (zh) 基于人手参数模型的低分辨率多视角手部重建方法
CN112560648B (zh) 一种基于rgb-d图像的slam方法
CN109948441B (zh) 模型训练、图像处理方法、装置、电子设备及计算机可读存储介质
CN112734890A (zh) 基于三维重建的人脸替换方法及装置
WO2022052782A1 (zh) 图像的处理方法及相关设备
Durasov et al. Double refinement network for efficient monocular depth estimation
CN110321452A (zh) 一种基于方向选择机制的图像检索方法
CN116863044A (zh) 人脸模型的生成方法、装置、电子设备及可读存储介质
CN113592021B (zh) 一种基于可变形和深度可分离卷积的立体匹配方法
KR20230078502A (ko) 이미지 처리 장치 및 방법
CN109741245A (zh) 平面信息的插入方法及装置
CN117252914A (zh) 深度估计网络的训练方法、装置、电子设备及存储介质
TWI728791B (zh) 圖像語義分割方法及裝置、儲存介質
CN110189247B (zh) 图像生成的方法、装置及系统
CN116452742B (zh) 一种航天操作场景的空间布局解析方法及系统
US20240193728A1 (en) Method and electronic device for training image processing model and method and electronic device for processing images using image processing model
WO2024121900A1 (en) Key-point associating apparatus, key-point associating method, and non-transitory computer-readable storage medium

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2020572391

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020853555

Country of ref document: EP

Effective date: 20210224

ENP Entry into the national phase

Ref document number: 2020335016

Country of ref document: AU

Date of ref document: 20200429

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20853555

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE