WO2019232894A1 - Complex scene-based human body key point detection system and method - Google Patents

Complex scene-based human body key point detection system and method Download PDF

Info

Publication number
WO2019232894A1
WO2019232894A1 PCT/CN2018/096157 CN2018096157W WO2019232894A1 WO 2019232894 A1 WO2019232894 A1 WO 2019232894A1 CN 2018096157 W CN2018096157 W CN 2018096157W WO 2019232894 A1 WO2019232894 A1 WO 2019232894A1
Authority
WO
WIPO (PCT)
Prior art keywords
bounding box
confidence
human
target
map
Prior art date
Application number
PCT/CN2018/096157
Other languages
French (fr)
Chinese (zh)
Inventor
宫法明
马玉辉
徐燕
袁向兵
宫文娟
李传涛
岳寒冰
丁洪金
Original Assignee
中国石油大学(华东)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国石油大学(华东) filed Critical 中国石油大学(华东)
Publication of WO2019232894A1 publication Critical patent/WO2019232894A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Definitions

  • the invention relates to a technology for detecting key points of a human body, in particular to a system and method for detecting key points of a human body based on a complex scene.
  • Computer video surveillance uses computer vision and image processing methods to perform target detection, target classification, target tracking, and behavior recognition of human targets in surveillance scenarios.
  • human behavior recognition is a research hotspot that has received extensive attention in recent years
  • human keypoint detection is the basis and key technology of intelligent video behavior recognition. It analyzes and judges target behaviors through human keypoint sequences, realizes active detection of hidden dangers, and provides early warning of abnormal events in public places. It has important practical application value in oilfields, hospitals, and homes for the elderly.
  • Human keypoint detection is to identify and locate key parts of human targets in the image. With the popularization of deep convolutional neural networks, this problem is further solved.
  • the detection methods of human key points are mainly divided into two categories: top-down methods and bottom-up methods.
  • the top-down method refers to the detection of a person's target, then the target bounding box is used for positioning, and the single-person estimation method is used to locate all joints of the human body;
  • the bottom-up method refers to the position of all joints first , Then distinguish the subordinate targets of the joint, and finally assemble the joint into a complete human posture.
  • the former is suitable for the situation where the human target is sparse, and the latter is suitable for the situation where the human target is dense.
  • Traditional human keypoint detection methods include template-based methods, statistical classification-based methods, and sliding window-based methods.
  • the method based on template matching is intuitive and simple, but it lacks robustness and is generally used in a single scene.
  • the method of probability statistics is widely used, but requires a large amount of training data to learn model parameters, and the calculation is more complicated.
  • the method based on sliding windows The labeling requirements for the training database are low, but they cannot overcome the effects of partial occlusion and construct the relative positional relationship between various parts of the human body.
  • the traditional method is more effective in a single specific scene, but it is more affected by background changes in complex scenes, and human parts are vulnerable to Occlusion and interference of other objects and targets are difficult to ensure the accuracy and completeness of human keypoint detection.
  • An object of the present invention is to provide a human body key point detection system and method based on complex scenes.
  • the system and method solve the problems of poor detection effect and large error of human body key points in complex scenes in the prior art, and can be used for complex Human keypoint detection in the scene, locate, identify and track human targets in dynamic scenes, to achieve accurate detection of keypoints of all human targets in the image.
  • the present invention provides a method for detecting a key point of a human body based on a complex scene.
  • the method includes:
  • S200 Extract features from a single frame static map by convolution operations to obtain feature maps.
  • a human target detection algorithm is used to actualize the confidence and preset of the feature map. Discriminate with confidence, remove non-human targets, and obtain discrete human target bounding boxes;
  • the target bounding box is enlarged, the original image is used as input, and features are extracted using a convolution operation, and the confidence value of each part is predicted by the classifier from the original image to generate a corresponding And use the confidence map obtained in the previous stage and the extracted features as inputs to the next stage, and iterate continuously between several stages to obtain an accurate partial location confidence map.
  • the human target detection algorithm includes:
  • a small convolution kernel convolution filter is used to predict the actual bounding box of the object in each default bounding box, the actual bounding box is used as the target bounding box, and the actual confidence degree is calculated , Judging the actual confidence level and the preset reliability level to remove invalid bounding boxes and to correct the target bounding box position;
  • the human keypoint detection algorithm flow includes:
  • each person's body is combined together through a two-dimensional vector field to form a complete human body; when there are multiple people at a certain point, the vectors of n people are summed and divided by Number of people.
  • FIG. 1 is a flowchart of a human body keypoint detection method based on a complex scene of the present invention.
  • (S212) Use a small convolution kernel convolution filter on each feature map unit to predict the actual bounding box of the object in each default bounding box.
  • the actual bounding box is used as the target bounding box, and the actual confidence is calculated.
  • the actual confidence is determined from the preset reliability; the confidence threshold can be set to 0.6, and the model loss calculation is performed for the case that is greater than the confidence threshold; for the case that is less than the confidence threshold, the SVM post-judgment judgment is directly performed.
  • the actual bounding box For the determination of the actual bounding box, use static images for data processing on the video stream, label the input image data set by deep learning technology, use the labeled image data set to train the human target detection model, and use this model to perform static image processing.
  • Human target detection get the specific position information of the target, and use the position information as input to get the target bounding box, which provides a data source for the key points extraction of the human body.
  • the corresponding data set is selected, for example, the image data set of an oil offshore platform, and the labeled image data set is used for training.
  • the deep learning SSD framework is used.
  • step (S212) during the confidence determination process, the error and corresponding score of each default bounding box and the corresponding actual bounding box need to be calculated to predict the categories and confidence of all targets in the region, which are greater than The target category of the above confidence threshold is regarded as the target category.
  • the actual bounding box needs to be matched with multiple default bounding boxes in the image, and the final result is the modified target bounding box.
  • the confidence discrimination is a preliminary screening process of target detection.
  • the default bounding box is matched with any actual bounding box with a value higher than the threshold, and the matching process is simplified by SVM posterior discrimination.
  • the algorithm allows to predict the scores of multiple overlapping default bounding boxes, instead of just picking the bounding box with the largest degree of overlap for score estimation.
  • step (S212) the model loss calculation is completed by a loss function, and most commonly used loss functions are square difference functions:
  • L (e) is the loss error
  • y is the expected output
  • is the actual output
  • y i, n represents the expected output of the ith default bounding box when the number of matching default bounding boxes is n;
  • ⁇ i, n represents the number of matching default bounding boxes as n , The actual output of the i-th default bounding box.
  • the conventional technology often uses conventional models in simple scenes. Confusion between the two leads to a higher false positive rate.
  • the two types of targets are subjected to SVM post-judgment discrimination. A large number of manually labeled image data sets are sent to the SVM classifier in which the human target and the cylindrical pipe target are trained in advance, and the locality is performed after the confidence determination. The SVM two-class classification is then used to discriminate the identified cylindrical pipelines as negative samples.
  • the overall objective loss function through double discrimination is the weighted average sum of the confidence loss and the localized scoring loss.
  • the overall objective loss function is as follows:
  • the initial weighting term ⁇ is set to 1 through cross-validation.
  • the output is the confidence C of each class, and the loss function L ( ⁇ , c) of the confidence is as follows:
  • the overall target loss function is to let the localized scoring loss function find a global minimum value in a gradual process, so that the difference in score is minimum and the prediction value is more accurate, so that the target bounding box is adjusted to better match the target object shape.
  • Body part positioning and correlation analysis are performed simultaneously on two branches.
  • the former is to find all key points, including: head, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow , Left wrist, right hip, right knee, right ankle, left hip, left knee, and left ankle; 14 key points; the latter is to find the degree of correlation between all parts to establish a relative position relationship;
  • the body part localization algorithm consists of a series of predictors, which are divided into multiple stages. Each stage repeatedly generates a confidence map for each part of the human body. Each confidence map contains a certain type of key point. The confidence map is related to the original image. Features are also used as inputs in the next stage to predict the positions of various parts, and then determine the positions of key points of the human body;
  • (S413) encode the position and direction of the human body part, and determine the subordinate problem of the key points of multiple people through the direction of the vector in the two-dimensional vector field;
  • step S412 the confidence maps at all scales are accumulated for each part to obtain a total confidence map, and the point with the highest degree of confidence is found, which is the position of the corresponding key point.
  • the human keypoint detection algorithm performs feature extraction on the input image at each scale to obtain the confidence map of each part of the human body.
  • the algorithm of the present invention uses the confidence map of each part to express the space constraints between the parts, and simultaneously processes the input feature map and response map under multiple scales, which can ensure accuracy and take into account the distance relationship between each part By continuously expanding the network's acceptance domain to detect the location of other parts, eventually accurate detection of all key points of the human body is achieved.
  • the present invention in order to avoid the problem that the human target bounding box obtained by the target detection has a partial error in a certain range, which causes the problem that the part of the human target may not be completely displayed in the bounding box, the present invention
  • the embodiment adopts a multi-scale method to expand the perceptual field and reduce errors caused by target detection.
  • the original bounding box is enlarged according to a ratio of 1.0: 1.2, and in this way, a complete human target is obtained, so that all key point coordinates are detected during the human key point detection stage.
  • the confidence value of each part is directly predicted from the original image, thereby generating a corresponding confidence map, which includes a background confidence map.
  • the preset value of P is 14.
  • x is a pixel with salient features in the image.
  • the original image is input into the network and the salient features in the image are extracted through a convolution operation.
  • the salient features are mainly texture features.
  • C 1 to represent the classifier in the first stage, the classifier can roughly predict the position of each part, thereby generating a confidence map of each part.
  • the classifier structure is as follows:
  • the confidence map and image features obtained in the first stage are used as input data in the second stage, and the original image is used as input again.
  • the feature functions used include the image data features, the confidence map of each part at this stage, and the context information of the classifiers at all levels.
  • the classifier C 2 continues to predict the position of each part, which is a modification of the predicted position in the previous stage.
  • the overall target F (t) is as follows:
  • equation (7) Denotes that the ideal confidence is obtained at t ⁇ T.
  • the optical flow threshold can be set by the optical flow method, the effective motion areas in the video are extracted, and the video fragments with human targets are screened for single frame image conversion.
  • set a hash function calculation every 24 frames select a random function random each time, take the frame number where each frame is located as its hash address, and get a randomly generated frame number, that is, For extracting frames.
  • I x , I y , I z , and I t are components of I (x, y, z, t) at x, y, z, and t
  • V x , V y , and V z are respectively Is the composition of x, y, and z in the optical flow vector of I (x, y, z, and t).
  • the three partial differentials are approximated by the corresponding direction difference of the image at the pixel of x, y, z, and t. .
  • a method for forming a two-dimensional vector field is specifically: obtaining an optical flow map by continuously extracting multiple frames at time t, and assigning a velocity vector to each pixel in the image to form a motion vector field, which is obtained through a preprocessing operation A two-dimensional vector field formed by optical flow displacement between successive frames.
  • the detection area is set in the video, the target detection is performed by the method of the present invention in a complex scene, the human target is located, identified and tracked, and the hover detection is performed on the event that the same human target moves within the area for a certain time, available Intelligent monitoring in banks, government agencies, embassies, cultural and religious gathering places, high-security perimeters, commercial and residential areas, etc., find suspicious targets and issue timely warnings to eliminate potential security risks.
  • the method of the present invention can accurately identify and locate the key points of the human body, and on this basis, can judge the behavior and posture of personnel. It can be applied to many fields such as petroleum, industry, medical treatment and security, and these fields are facing many safety issues. Hidden danger factors, such as accidentally falling into the sea for oil drilling and production operations, industrial production personnel failing to wear safety equipment in compliance with regulations, and elderly and patient falls.
  • the method of the invention can reduce the time of manual intervention, avoid economic losses caused by personal accidents and illegal operation of production, thereby ensuring the safety of industrial production, saving manpower and material resources, and improving the level of production management.
  • the human target detection module also adopts the steps of the human target detection algorithm described above based on the human key point detection method in a complex scene.

Abstract

Discloses are a complex scene-based human body key point detection system and method. The method comprises: inputting monitor video information to obtain a single-frame static map and a multi-frame optical flow map; extracting features from the single-frame static map by means of a convolution operation to obtain a feature map, and in order to solve the impact of an interference target on personnel target detection under a complex scene, discriminating an actual confidence coefficient of the feature map from a preset confidence coefficient by using a personnel target detection algorithm to obtain a discretized personnel target bounding box; performing optical flow stacking on the multi-frame optical flow map to form a two-dimensional vector field; and extracting features in the discretized personnel target bounding box to obtain the feature map, obtaining key points and association degrees of parts, generating a part confidence map for each part of a human body by using a predictor, and precisely detecting human body key points by means of the part confidence map and the two-dimensional vector field. The system and method of the present invention are used for human body key point detection under a complex scene, thereby implementing precise detection of personnel target key points.

Description

一种基于复杂场景下的人体关键点检测系统及方法Human key point detection system and method based on complex scene 技术领域Technical field
本发明涉及一种人体关键点检测技术,具体涉及一种基于复杂场景下的人体关键点检测系统及方法。The invention relates to a technology for detecting key points of a human body, in particular to a system and method for detecting key points of a human body based on a complex scene.
背景技术Background technique
目前,我国的“天网”工程建设已初具规模,随着深度学习和智能视频行为分析等先进技术的发展成熟,如何有效地利用监控视频成为视频数据分析的重点。At present, China's "Skynet" project construction has begun to take shape. With the development of advanced technologies such as deep learning and intelligent video behavior analysis, how to effectively use surveillance video has become the focus of video data analysis.
计算机视频监控是利用计算机视觉和图像处理的方法对图像序列进行目标检测、目标分类、目标跟踪以及对监视场景中人员目标的行为识别。其中,人体行为识别是近年来被广泛关注的研究热点,而人体关键点检测则是智能视频行为识别的基础,也是核心的关键技术。通过人体关键点序列对目标行为进行分析和研判,实现安全隐患的主动发现、公共场所异常事件的预警,在油田、医院和敬老院等场所具有重要的实际应用价值。Computer video surveillance uses computer vision and image processing methods to perform target detection, target classification, target tracking, and behavior recognition of human targets in surveillance scenarios. Among them, human behavior recognition is a research hotspot that has received extensive attention in recent years, and human keypoint detection is the basis and key technology of intelligent video behavior recognition. It analyzes and judges target behaviors through human keypoint sequences, realizes active detection of hidden dangers, and provides early warning of abnormal events in public places. It has important practical application value in oilfields, hospitals, and homes for the elderly.
人体关键点检测是对图像中人员目标的关键部位进行识别和定位,随着深度卷积神经网络的推广,这一问题得到进一步解决。人体关键点检测的方法主要分为两类:自上而下的方法和自下而上的方法。其中,自上而下的方法是指先检测到人员目标,然后使用目标包围盒进行定位,最后使用单人估计的方法定位人体的所有关节;自下而上的方法是指先定位到所有关节的位置,然后再区分关节的从属目标,最后将关节组装成一个完整的人体姿态。前者适用于人员目标稀疏的情况,后者适用于人员目标密集的情况。Human keypoint detection is to identify and locate key parts of human targets in the image. With the popularization of deep convolutional neural networks, this problem is further solved. The detection methods of human key points are mainly divided into two categories: top-down methods and bottom-up methods. Among them, the top-down method refers to the detection of a person's target, then the target bounding box is used for positioning, and the single-person estimation method is used to locate all joints of the human body; the bottom-up method refers to the position of all joints first , Then distinguish the subordinate targets of the joint, and finally assemble the joint into a complete human posture. The former is suitable for the situation where the human target is sparse, and the latter is suitable for the situation where the human target is dense.
传统的人体关键点检测方法包括基于模板匹配的方法、基于统计分类的方法和基于滑动窗口检测的方法。基于模板匹配的方法直观、简单,但是缺乏鲁棒性,一般用于单一场景中;概率统计的方法应用广泛,但却需要大量的训练数据来学习模型参数,计算较为复杂;基于滑动窗口的方法对训练数据库的标注要求较低,但却无法克服部分遮挡的影响以及构建人体各部位之 间的相对位置关系。Traditional human keypoint detection methods include template-based methods, statistical classification-based methods, and sliding window-based methods. The method based on template matching is intuitive and simple, but it lacks robustness and is generally used in a single scene. The method of probability statistics is widely used, but requires a large amount of training data to learn model parameters, and the calculation is more complicated. The method based on sliding windows The labeling requirements for the training database are low, but they cannot overcome the effects of partial occlusion and construct the relative positional relationship between various parts of the human body.
综上所述,由于人体的非刚性特点、姿态的多变性及光照变化等影响,传统方法在单一特定场景下效果较突出,但是在复杂场景中受背景变化的影响比较大,人体部位易受其他物体目标的遮挡和干扰,难以保证人体关键点检测的准确性和完整性。In summary, due to the non-rigid characteristics of the human body, the variability of posture, and changes in lighting, the traditional method is more effective in a single specific scene, but it is more affected by background changes in complex scenes, and human parts are vulnerable to Occlusion and interference of other objects and targets are difficult to ensure the accuracy and completeness of human keypoint detection.
发明内容Summary of the Invention
本发明的目的是提供一种基于复杂场景下的人体关键点检测系统及方法,该系统及方法解决了现有技术对复杂场景中人体关键点检测效果差且误差大的问题,能够用于复杂场景下的人体关键点检测,对动态场景中人员目标进行定位、识别和跟踪,实现图像中所有人员目标关键点的精准检测。An object of the present invention is to provide a human body key point detection system and method based on complex scenes. The system and method solve the problems of poor detection effect and large error of human body key points in complex scenes in the prior art, and can be used for complex Human keypoint detection in the scene, locate, identify and track human targets in dynamic scenes, to achieve accurate detection of keypoints of all human targets in the image.
为了达到上述目的,本发明提供了一种基于复杂场景下的人体关键点检测方法,该方法包含:In order to achieve the above object, the present invention provides a method for detecting a key point of a human body based on a complex scene. The method includes:
(S100)输入监控视频信息,进行预处理得到单帧静态图和多帧光流图;(S100) Input monitoring video information, and perform preprocessing to obtain single-frame static image and multi-frame optical flow image;
(S200)对单帧静态图通过卷积操作提取特征以得到特征图,为解决复杂场景下干扰目标对人员目标检测的影响,采用人员目标检测算法,以对特征图的实际置信度与预设置信度进行判别,去除非人员目标,得到离散化人员目标包围盒;(S200) Extract features from a single frame static map by convolution operations to obtain feature maps. In order to solve the impact of interference targets on human target detection in complex scenes, a human target detection algorithm is used to actualize the confidence and preset of the feature map. Discriminate with confidence, remove non-human targets, and obtain discrete human target bounding boxes;
(S300)对多帧光流图采用光流堆叠来形成二维矢量场;(S300) Use optical flow stacking for multi-frame optical flow diagrams to form a two-dimensional vector field;
(S400)提取所述的离散化人员目标包围盒中特征,得到特征图,获得部位的关键点和关联程度,利用预测器为人体每个部位生成部位置信图,通过部位置信图和二维矢量场实现人体关键点的精准检测。(S400) Extract the features in the discretized human target bounding box to obtain a feature map, obtain the key points and the degree of association of the part, use a predictor to generate a partial position information map for each part of the human body, and use the partial position information map and two The dimensional vector field enables accurate detection of key points of the human body.
其中,在所述的步骤S400中,在第一阶段,扩大目标包围盒,以原始图像作为输入,采用卷积操作提取特征后,从原始图像通过分类器预测每个部位的置信值,产生对应的置信图,且将前一阶段获得的置信图与提取的特征作为下一阶段的输入,在若干阶段之间不断迭代,以获得精确的部位置信图。Wherein, in the step S400, in the first stage, the target bounding box is enlarged, the original image is used as input, and features are extracted using a convolution operation, and the confidence value of each part is predicted by the classifier from the original image to generate a corresponding And use the confidence map obtained in the previous stage and the extracted features as inputs to the next stage, and iterate continuously between several stages to obtain an accurate partial location confidence map.
优选地,所述的人员目标检测算法包括:Preferably, the human target detection algorithm includes:
(S210)对不同尺寸的单帧静态图产生一组固定大小的默认边界框集合,对该组默认边界框内的区域进行特征提取;(S210) Generate a set of fixed-size default bounding box sets for single-frame still images of different sizes, and perform feature extraction on the regions within the set of default bounding boxes;
(S211)对人员目标的形体表征,提取主要特征,以形成不同层次的特征图单元,作为图像数据集,将每个层次的特征图单元以卷积的方式平铺特征映射,使得每个默认边界框与相对应的特征图单元的位置固定;(S211) For the physical characterization of the human target, extract the main features to form feature map units at different levels. As an image data set, the feature map units at each level are tiled with feature maps in a convolutional manner so that each default The position of the bounding box and the corresponding feature map unit is fixed;
(S212)在所述的每个特征图单元上使用小卷积核卷积滤波器预测每个默认边界框中物体的实际边界框,该实际边界框作为目标包围盒,并计算出实际置信度,将实际置信度与预设置信度进行判别,以去除无效的包围盒,以修正目标包围盒位置;(S212) On each feature map unit, a small convolution kernel convolution filter is used to predict the actual bounding box of the object in each default bounding box, the actual bounding box is used as the target bounding box, and the actual confidence degree is calculated , Judging the actual confidence level and the preset reliability level to remove invalid bounding boxes and to correct the target bounding box position;
(S213)输出在不同层次上的离散化目标包围盒,其具有不同的长宽比尺度。(S213) Output discrete target bounding boxes at different levels, which have different aspect ratio scales.
优选地,在所述的步骤S212中,在进行置信度判别过程中,需要计算出每个默认边界框与相对应的实际边界框的误差和相应的评分,以预测默认边界框区域内的所有目标的类别和置信度;设定所述的预设置信度的阈值;当所述的实际置信度大于该阈值时,进行模型损失计算;当所述的实际置信度小于该阈值时,进行SVM后验判别;当判别为人员目标时,则微调目标包围盒;当判别为非人员目标时,剔除无效的包围盒。Preferably, in the step S212, during the confidence determination process, the error and corresponding score of each default bounding box and the corresponding actual bounding box need to be calculated to predict all the values in the default bounding box area. The category and confidence of the target; set the threshold of the preset reliability; when the actual confidence is greater than the threshold, perform model loss calculation; when the actual confidence is less than the threshold, perform SVM A posteriori discrimination; when it is judged to be a human target, fine-tune the bounding box of the target; when it is judged to be a non-human target, remove the invalid bounding box.
优选地,所述的模型损失计算通过损失函数完成,损失函数为:Preferably, the model loss calculation is completed by a loss function, and the loss function is:
Figure PCTCN2018096157-appb-000001
Figure PCTCN2018096157-appb-000001
式(1)中,L(e)是损失误差,y是期望输出,α为实际输出。In equation (1), L (e) is the loss error, y is the expected output, and α is the actual output.
对y的分布进行矩估计,用α来表示y的交叉熵为:Perform moment estimation on the distribution of y, and use α to represent the cross entropy of y as:
Figure PCTCN2018096157-appb-000002
Figure PCTCN2018096157-appb-000002
式(2)中,α i是第i个默认边界框的实际输出,y i是第i个默认边界框的期望输出。 In Equation (2), α i is the actual output of the i-th default bounding box, and y i is the expected output of the i-th default bounding box.
n个默认边界框的平均交叉熵为:The average cross entropy of the n default bounding boxes is:
Figure PCTCN2018096157-appb-000003
Figure PCTCN2018096157-appb-000003
式(3)中,y i,n表示当相匹配的默认边界框的数量为n时,第i个默认边界框的期望输出;α i,n表示当相匹配的默认边界框的数量为n时,第i个默认边界框的实际输出。 In formula (3), y i, n represents the expected output of the i-th default bounding box when the number of matching default bounding boxes is n; α i, n represents the number of matching default bounding boxes as n , The actual output of the i-th default bounding box.
优选地,在所述的步骤S212中,当存在混淆目标时,对人员目标和混淆目标进行SVM后验判别,将大量人工标注的图像数据集送入SVM预先训练好人员目标和混淆目标的分类器中,在置信度判别后进行本地SVM二分类再判别,将识别出的混淆目标作为负样本去除,人员目标作为正样本,在正样本人员类别的置信度基础上,进行评分确定是否为真实的人员目标。Preferably, in step S212, when there are confusing targets, SVM post-judgment is performed on the human targets and the confusing targets, and a large number of manually labeled image data sets are sent to the SVM to pre-train the classification of the human targets and the confusing targets. The local SVM binary classification is performed after the confidence determination, and the identified confused target is removed as a negative sample, and the human target is used as a positive sample. Based on the confidence level of the positive sample person category, the score is determined to determine whether it is true. Personnel goals.
优选地,双重判别的总体目标损失函数是置信度损失和本地化评分损失的加权平均和,该总体目标损失函数为:Preferably, the overall target loss function of the double discrimination is a weighted average sum of the confidence loss and the localization score loss, and the overall target loss function is:
Figure PCTCN2018096157-appb-000004
Figure PCTCN2018096157-appb-000004
式(4)中,δ为初始权重项;V是与实际边界框相匹配的默认边界框的数量;L(α,c)为置信度的损失函数;L(α,f)为本地化评分损失函数。In Equation (4), δ is the initial weight term; V is the number of default bounding boxes that match the actual bounding box; L (α, c) is the loss function of confidence; L (α, f) is the localization score Loss function.
通过交叉验证将所述的初始权重项δ设置为1;当以置信度评价期望输出时,输出为每一类的置信度C,则置信度的损失函数L(α,c)为:The cross-validation sets the initial weighting term δ to 1; when the expected output is evaluated with confidence, the output is the confidence C of each class, and the loss function L (α, c) of the confidence is:
Figure PCTCN2018096157-appb-000005
Figure PCTCN2018096157-appb-000005
式(5)中,y i,N表示当相匹配的默认边界框的数量为N时,第i个默认边界框的期望输出;α i,N表示当相匹配的默认边界框的数量为N时,第i个默认边界框的实际输出。 In formula (5), y i, N indicates the expected output of the i-th default bounding box when the number of matching default bounding boxes is N; α i, N indicates that when the number of matching default bounding boxes is N , The actual output of the i-th default bounding box.
当V=0时,所述的置信度损失为0。When V = 0, the confidence loss is zero.
Figure PCTCN2018096157-appb-000006
时,表示第i个默认边界框与类别p的第j个实际边界框相匹配。
when
Figure PCTCN2018096157-appb-000006
, It means that the i-th default bounding box matches the j-th actual bounding box of the category p.
Figure PCTCN2018096157-appb-000007
时,表示第i个默认边界框与类别p的第j个实际边界框不匹配,本地化评分损失函数为:
when
Figure PCTCN2018096157-appb-000007
, It means that the i-th default bounding box does not match the j-th actual bounding box of the category p, and the localized scoring loss function is:
Figure PCTCN2018096157-appb-000008
Figure PCTCN2018096157-appb-000008
式(6)中,
Figure PCTCN2018096157-appb-000009
表示默认边界框与实际边界框相匹配的评分;f j表示默认边界框的预设评分,
Figure PCTCN2018096157-appb-000010
表示第α i个默认边界框的实际评分;Δ表示间隔。
In formula (6),
Figure PCTCN2018096157-appb-000009
Represents the score that the default bounding box matches the actual bounding box; f j represents the preset score of the default bounding box,
Figure PCTCN2018096157-appb-000010
Represents the actual score of the α i default bounding box; Δ represents the interval.
优选地,所述的第一个阶段的分类器C 1的结构为: Preferably, the structure of the classifier C 1 in the first stage is:
Figure PCTCN2018096157-appb-000011
Figure PCTCN2018096157-appb-000011
其中,
Figure PCTCN2018096157-appb-000012
表示图像的像素空间,x i表示图像中每个像素的位置,p表示具体模型部位,
Figure PCTCN2018096157-appb-000013
表示第一阶段中部位p的置信值。
among them,
Figure PCTCN2018096157-appb-000012
Represents the pixel space of the image, x i represents the position of each pixel in the image, p represents the specific model part,
Figure PCTCN2018096157-appb-000013
Represents the confidence value of location p in the first stage.
通过将前一阶段获得的置信图与提取的特征作为下一阶段的数据输入,以对前一阶段的位置进行修正,总体目标F(t)为:By using the confidence map and extracted features obtained in the previous stage as data input in the next stage to modify the position in the previous stage, the overall objective F (t) is:
Figure PCTCN2018096157-appb-000014
Figure PCTCN2018096157-appb-000014
式(7)中,
Figure PCTCN2018096157-appb-000015
表示理想置信度在t∈T阶段取得。
In equation (7),
Figure PCTCN2018096157-appb-000015
Denotes that the ideal confidence is obtained at t ∈ T.
优选地,在所述的步骤S300中,对所述的多帧光流图通过光流法设定光流阈值,提取出视频中有效运动区域,筛选出带有人员目标的视频片段以转换为单帧图像,并且设定每经任意一间隔帧进行哈希函数计算,选择一个随机函数random,取每帧所在的帧编号为其哈希地址,得到随机生成的帧编号为提取帧。Preferably, in step S300, an optical flow threshold is set by the optical flow method on the multi-frame optical flow map, an effective motion area in the video is extracted, and a video segment with a human target is filtered to be converted into For a single frame of image, a hash function calculation is performed for every interval frame, a random function random is selected, the frame number where each frame is located is used as its hash address, and a randomly generated frame number is obtained as the extracted frame.
通过泰勒公式将所述的多帧光流图的约束方程转换为:The constraint equation of the multi-frame optical flow diagram is converted into:
I x×V x+I y×V y+I z×V z=-It    (8) I x × V x + I y × V y + I z × V z = -It (8)
式(8)中,I x,I y,I z,I t分别为I(x,y,z,t)在x,y,z,t处的分量,V x,V y,V z分别是I(x,y,z,t)的光流向量中x,y,z的组成,I(x,y,z,t)为在(x,y,z)位置的体素。 In formula (8), I x , I y , I z , and I t are components of I (x, y, z, t) at x, y, z, and t, and V x , V y , and V z are respectively Is the composition of x, y, z in the optical flow vector of I (x, y, z, t), and I (x, y, z, t) is the voxel at (x, y, z) position.
所述的二维矢量场的形成方法包含:通过在时间t上进行连续提取多帧得到光流图,给图像中的每个像素点赋予一个速度矢量形成一个运动矢量场,通过预处理操作得到连续帧之间的光流位移堆叠场,以形成二维矢量场。The method for forming a two-dimensional vector field includes: successively extracting multiple frames at time t to obtain an optical flow map, and assigning a velocity vector to each pixel in the image to form a motion vector field, which is obtained through a preprocessing operation; Optical flow between successive frames shifts the stacked fields to form a two-dimensional vector field.
优选地,所述的人体关键点检测算法流程包括:Preferably, the human keypoint detection algorithm flow includes:
(S410)将目标检测得到的离散化人员目标包围盒坐标作为算法的初始输入,经过卷积操作提取特征得到特征图;(S410) Using the coordinates of the discretized human target bounding box obtained by the target detection as the initial input of the algorithm, extracting features through a convolution operation to obtain a feature map;
(S411)身体部位定位和关联程度分析在两个分支上同时进行,通过身体部位定位求得所有的关键点,通过关联程度分析求得所有部位之间的关联程度,以建立相对位置关系;(S411) Body part positioning and correlation degree analysis are performed simultaneously on two branches, all key points are obtained through body part positioning, and degree of correlation between all parts is obtained through correlation degree analysis to establish a relative position relationship;
(S412)所述的身体部位定位的算法由预测器组成,分成若干阶段,每个阶段为人体每个部位重复生成置信图,每张置信图包含某一种关键点,该置信图与原始图像特征同时作为下一阶段的输入,预测各部位的位置,进而确定人体各关键点的位置;The body part positioning algorithm described in (S412) consists of a predictor. It is divided into several stages. Each stage repeatedly generates a confidence map for each part of the human body. Each confidence map contains a certain kind of key points. The confidence map and the original image Features are also used as inputs in the next stage to predict the positions of various parts, and then determine the positions of key points of the human body;
(S413)对人体部位的位置和方向进行编码,通过在所述的二维矢量场 中矢量的方向判别多人关键点的从属问题;(S413) encode the position and direction of the human body part, and determine the subordinate problem of the key points of multiple people by the direction of the vector in the two-dimensional vector field;
(S414)利用矢量之间的位移长度建立人体各部位之间的相对位置关系,实现人体不可见关键点的预测与估计,得到人体所有关键点的详细信息。(S414) Use the displacement length between the vectors to establish the relative positional relationship between various parts of the human body, realize the prediction and estimation of invisible key points of the human body, and obtain detailed information of all key points of the human body.
其中,在所述的步骤S412中,对每个部位累加所有尺度下的置信图,得到总置信图,找出置信度最大的点,该点为相应的关键点的位置。In step S412, the confidence maps at all scales are accumulated for each part to obtain the total confidence map, and the point with the highest confidence is found, and this point is the position of the corresponding key point.
其中,对于多人关键点检测,通过二维矢量场将每个人的身体组合在一起,形成一个完整的人体;当某个点有多人重叠时,将n个人的向量求和,再除以人数。Among them, for the detection of multi-person key points, each person's body is combined together through a two-dimensional vector field to form a complete human body; when there are multiple people at a certain point, the vectors of n people are summed and divided by Number of people.
本发明还提供了一种基于复杂场景下的人体关键点检测系统,该系统包含:数据预处理模块,其对监控视频信息进行处理,以获得单帧静态图和多帧光流图;人员目标检测模块,其通过卷积操作提取所述的数据预处理模块发送的单帧静态图的特征,使用小卷积核卷积滤波器预测每个边界框中物体的实际边界框并计算实际置信度,将实际边界框作为目标包围盒,采用SVM后验判别将实际置信度与预设置信度进行判别,以去除无效的包围盒,以修正目标包围盒位置,获得离散化人员目标包围盒;以及人体关键点检测模块,其接收所述的人员目标检测模块发送的离散化人员目标包围盒坐标,通过卷积操作提取特征以得到特征图,并获得部位的关键点和关联程度,利用预测器为人体每个部位生成部位置信图,通过部位置信图和二维矢量场实现人体关键点的精准检测。The invention also provides a human key point detection system based on a complex scene. The system includes: a data pre-processing module that processes monitoring video information to obtain a single frame static image and a multi-frame optical flow image; personnel targets A detection module that extracts features of a single frame static map sent by the data preprocessing module through a convolution operation, uses a small convolution kernel convolution filter to predict the actual bounding box of the object in each bounding box, and calculates the actual confidence , Using the actual bounding box as the target bounding box, using SVM posterior discrimination to discriminate the actual confidence from the preset reliability to remove invalid bounding boxes, modify the position of the target bounding box, and obtain a discrete person target bounding box; and The human key point detection module receives the discrete human target bounding box coordinates sent by the human target detection module, extracts features through a convolution operation to obtain a feature map, and obtains the key points and the degree of association of the part. The predictor is used as Generate partial position information maps for each part of the human body, and realize key points of the human body through partial position information maps and two-dimensional vector fields. Detection.
其中,所述的人体关键点检测模块采用若干阶段迭代的方式,将前一阶段获得的置信图与提取的特征作为下一阶段的输入,以在若干阶段之间不断迭代,获得精确的部位置信图。Wherein, the human keypoint detection module adopts several stages of iteration, and uses the confidence map and extracted features obtained in the previous stage as the input of the next stage to continuously iterate between several stages to obtain accurate part positions Letter illustration.
本发明的基于复杂场景下的人体关键点检测系统及方法,解决了现有技术对复杂场景中人体关键点检测效果差且误差大的问题,具有以下优点:The system and method for detecting key points of a human body in a complex scene based on the present invention solve the problems of poor detection effect and large error of the key points of a human body in a complex scene in the prior art, and have the following advantages:
(1)本发明的方法及系统采用人员目标检测算法去除非人员目标,简化了复杂场景,能够应用于复杂场景下的人体关键点检测,进行精准检测;(1) The method and system of the present invention use a human target detection algorithm to remove non-human targets, simplifying complex scenes, and can be used to detect key points of human bodies in complex scenes for accurate detection;
(2)本发明的方法及系统采用二维矢量场用于在图像域上对人体部位的位置和方向进行编码,能够多人关键点的从属问题,实现图像中所有人员目标关键点的精准检测;(2) The method and system of the present invention use a two-dimensional vector field for encoding the position and direction of a human body part in the image domain, which can subordinate the key points of multiple people, and achieve accurate detection of key points of all human targets in the image ;
(3)本发明的方法在SVM后验判别中采用的总体目标损失函数是让本 地化评分损失函数在一个渐进过程中找到一个全局极小值,使得评分差异最小以及预测值更加准确,使得对目标包围盒进行调整以更好地匹配目标对象形状;(3) The overall target loss function used in the method of the present invention in the SVM a posteriori determination is to let the localized scoring loss function find a global minimum in a gradual process, so that the score difference is minimized and the predicted value is more accurate, making the The target bounding box is adjusted to better match the shape of the target object;
(4)本发明的方法还能够对特殊场景中易于混淆的目标,如海洋平台中人员目标的安全服颜色与某些柱形管道颜色和形态一致,进行处理,以去除混淆目标,以提高识别准确率;(4) The method of the present invention can also process the targets that are easy to be confused in special scenes, such as the color of the safety clothing of the human target in the offshore platform and the color and shape of some cylindrical pipes, to remove the confused targets and improve recognition. Accuracy;
(5)本发明的方法在人体关键点检测时用各部位的置信图来表达各部位之间的空间约束,同时在多个尺度下处理输入的特征图和响应图,既能确保精度,又考虑了各个部位之间的距离关系,通过不断扩大网络的接受域来检测其他部位位置,实现了人体所有关键点的精准检测。(5) The method of the present invention uses the confidence map of each part to express the space constraints between the parts when detecting key points of the human body, and simultaneously processes the input feature map and response map under multiple scales, which can ensure accuracy and Taking into account the distance relationship between various parts, by continuously expanding the network's acceptance domain to detect the positions of other parts, accurate detection of all key points of the human body is achieved.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本发明的基于复杂场景下的人体关键点检测方法的流程图。FIG. 1 is a flowchart of a human body keypoint detection method based on a complex scene of the present invention.
图2为本发明的基于复杂场景下的人体关键点检测方法的原理图。FIG. 2 is a principle diagram of a human body keypoint detection method based on a complex scene of the present invention.
图3为本发明的人员目标检测算法的流程图。FIG. 3 is a flowchart of a human target detection algorithm of the present invention.
图4为本发明的人体关键点检测算法的流程图。FIG. 4 is a flowchart of a human keypoint detection algorithm of the present invention.
图5为本发明的基于复杂场景下的人体关键点检测系统的结构图。FIG. 5 is a structural diagram of a human body key point detection system based on a complex scene according to the present invention.
具体实施方式Detailed ways
以下结合附图和实施例对本发明的技术方案做进一步的说明。The technical solution of the present invention is further described below with reference to the accompanying drawings and embodiments.
一种基于复杂场景下的人体关键点检测方法,如图1所示,为本发明的基于复杂场景下的人体关键点检测方法的流程图,如图2所示,为本发明的基于复杂场景下的人体关键点检测方法的原理图,该方法包含:A human body key point detection method based on a complex scene, as shown in FIG. 1, is a flowchart of a human body key point detection method based on a complex scene, as shown in FIG. 2, which is a complex scene based on the present invention. The schematic diagram of the human key point detection method under the method, which includes:
(S100)输入监控视频信息,进行预处理得到单帧静态图和多帧光流图;(S100) Input monitoring video information, and perform preprocessing to obtain single-frame static image and multi-frame optical flow image;
(S200)对单帧静态图通过卷积操作提取特征以得到特征图,为解决复杂场景下干扰目标对人员目标检测的影响,采用人员目标检测算法,以对特征图的实际置信度与预设置信度进行判别,去除非人员目标,得到离散化人员目标包围盒;(S200) Extract features from a single frame static map by convolution operations to obtain feature maps. In order to solve the impact of interference targets on human target detection in complex scenes, a human target detection algorithm is used to actualize the confidence and preset of the feature map. Discriminate with confidence, remove non-human targets, and obtain discrete human target bounding boxes;
(S300)对多帧光流图采用光流堆叠来形成二维矢量场;(S300) Use optical flow stacking for multi-frame optical flow diagrams to form a two-dimensional vector field;
(S400)提取离散化人员目标包围盒中特征,得到特征图,获得部位的 关键点和关联程度,利用预测器为人体每个部位生成部位置信图,通过部位置信图和二维矢量场实现人体关键点的精准检测。(S400) Extract the features in the discretized human target bounding box, obtain the feature map, obtain the key points and the degree of association of the part, use the predictor to generate a partial position information map for each part of the human body, and pass the partial position information map and two-dimensional vector field Realize accurate detection of key points of the human body.
其中,在步骤S400中,在第一阶段,扩大目标包围盒,以原始图像作为输入,采用卷积操作提取特征后,从原始图像通过分类器预测每个部位的置信值,产生对应的置信图,且将前一阶段获得的置信图与提取的特征作为下一阶段的输入,在若干阶段之间不断迭代,以获得精确的部位置信图。Among them, in step S400, in the first stage, the target bounding box is enlarged, the original image is used as an input, and features are extracted using a convolution operation, and then the confidence value of each part is predicted by the classifier from the original image to generate a corresponding confidence map. , And the confidence map and the extracted features obtained in the previous stage are used as the input of the next stage, and iterative between several stages to obtain an accurate partial location trust map.
如图3所示,为本发明的人员目标检测算法的流程图,人员目标检测算法流程包括:As shown in FIG. 3, it is a flowchart of a human target detection algorithm according to the present invention. The process of the human target detection algorithm includes:
(S210)对于不同尺寸的单帧静态图产生一组固定大小的默认边界框集合,对该组默认边界框内的区域进行特征提取;针对较大的单帧静态图采用若干默认边界框提取特征;(S210) Generate a set of fixed-size default bounding box sets for single-frame static maps of different sizes, and extract features from the area within the set of default bounding boxes; use a number of default bounding boxes to extract features for larger single-frame static maps ;
(S211)针对复杂场景下人员目标的形体表征,以颜色、形状和纹理等特征为主要特征进行提取,以形成不同层次的特征图单元,作为图像数据集,将每个层次的特征图单元以卷积的方式平铺特征映射,使得每个默认边界框与相对应的特征图单元的位置是固定的;(S211) According to the physical representation of human targets in complex scenes, features such as color, shape, and texture are extracted as main features to form feature map units at different levels. As image data sets, feature map units at each level are The feature map is tiled in a convolutional manner, so that the position of each default bounding box and the corresponding feature map unit is fixed;
(S212)在每个特征图单元上使用小卷积核卷积滤波器去预测每个默认边界框中物体的实际边界框,该实际边界框作为目标包围盒,并计算出实际置信度,将实际置信度与预设置信度进行判别;可设定置信度阈值为0.6,对于大于置信度阈值的情况进行模型损失的计算;对于小于置信度阈值的情况直接进行SVM后验判别,若判别为人员目标,则需对目标包围盒进行微调处理,否则剔除无效的包围盒;具体地,对目标包围盒使用线性回归器进行微调处理,精细修正包围盒位置,否则视为无效的包围盒(在判别不是人员目标的情况下),进行剔除操作;(S212) Use a small convolution kernel convolution filter on each feature map unit to predict the actual bounding box of the object in each default bounding box. The actual bounding box is used as the target bounding box, and the actual confidence is calculated. The actual confidence is determined from the preset reliability; the confidence threshold can be set to 0.6, and the model loss calculation is performed for the case that is greater than the confidence threshold; for the case that is less than the confidence threshold, the SVM post-judgment judgment is directly performed. If the judgment is For human targets, the target bounding box needs to be fine-tuned, otherwise invalid bounding boxes are eliminated; specifically, the target bounding box is fine-tuned using a linear regression device to finely modify the position of the bounding box, otherwise it is regarded as an invalid bounding box (in (If the judgment is not the goal of the person), perform the culling operation;
(S213)输出一系列在不同层次上的离散化目标包围盒,且具有不同的长宽比尺度。(S213) A series of discretized target bounding boxes at different levels are output with different aspect ratio scales.
对于实际边界框的确定,对视频流采用静态图像进行数据处理,通过深度学习技术对输入的图像数据集标签化,利用已标注的图像数据集训练人员目标检测模型,通过该模型对静态图像进行人员目标检测,得到目标的具体位置信息,并将位置信息作为输入得到目标包围盒,为人体关键点提取提供数据来源。在不同的场景下,选用相应的数据集,例如石油海上平台的图像 数据集,利用已标注的图像数据集进行训练,使用的是深度学习SSD框架。For the determination of the actual bounding box, use static images for data processing on the video stream, label the input image data set by deep learning technology, use the labeled image data set to train the human target detection model, and use this model to perform static image processing. Human target detection, get the specific position information of the target, and use the position information as input to get the target bounding box, which provides a data source for the key points extraction of the human body. In different scenarios, the corresponding data set is selected, for example, the image data set of an oil offshore platform, and the labeled image data set is used for training. The deep learning SSD framework is used.
进一步需要知道的,不同尺度的特征图在每个位置使用不同长宽比的默认边界框。在步骤(S212)中,进行置信度判别过程中,需要计算出每个默认边界框与相对应的实际边界框的误差和相应的评分,以预测区域内的所有目标的类别和置信度,大于上述置信度阈值的对象类别视为目标类别。通过计算误差和评分,需要将实际边界框与图像中多个默认边界框进行匹配,最后得到的是修正后的目标包围盒。It is further necessary to know that feature maps of different scales use default bounding boxes with different aspect ratios at each position. In step (S212), during the confidence determination process, the error and corresponding score of each default bounding box and the corresponding actual bounding box need to be calculated to predict the categories and confidence of all targets in the region, which are greater than The target category of the above confidence threshold is regarded as the target category. By calculating errors and scores, the actual bounding box needs to be matched with multiple default bounding boxes in the image, and the final result is the modified target bounding box.
并且,置信度判别是目标检测的初步筛选过程,将默认边界框与任何具有高于阈值的实际边界框进行重叠度匹配,通过SVM后验判别简化了匹配过程。此外,本算法允许预测多个重叠的默认边界框的评分,而不是只挑选具有最大重叠度的边界框进行评分预估。Moreover, the confidence discrimination is a preliminary screening process of target detection. The default bounding box is matched with any actual bounding box with a value higher than the threshold, and the matching process is simplified by SVM posterior discrimination. In addition, the algorithm allows to predict the scores of multiple overlapping default bounding boxes, instead of just picking the bounding box with the largest degree of overlap for score estimation.
因此,本发明的人员目标检测算法结合了不同分辨率的多个特征图的预测,能够自然处理各种尺寸的目标对象,与其他单级方法相比,即使输入图像(单帧静态图)尺寸较小也具有较高的精度。Therefore, the human object detection algorithm of the present invention combines the prediction of multiple feature maps with different resolutions, and can naturally process target objects of various sizes. Compared with other single-level methods, even the size of the input image (single frame still image) Smaller also has higher accuracy.
进一步还需要知道的,在步骤(S212)中,模型损失计算通过损失函数完成,常用的损失函数大多是平方差函数:It is further necessary to know that in step (S212), the model loss calculation is completed by a loss function, and most commonly used loss functions are square difference functions:
Figure PCTCN2018096157-appb-000016
Figure PCTCN2018096157-appb-000016
式(1)中,L(e)是损失误差,y是期望输出,α为实际输出。In equation (1), L (e) is the loss error, y is the expected output, and α is the actual output.
当实际输出与期望输出差距越大,则模型损失越高,而在实际操作中,y的分布并不能通过计算精确获得,只能对y的分布进行矩估计,即为α值,用α来表示y的交叉熵:When the difference between the actual output and the expected output is larger, the model loss is higher, and in actual operation, the distribution of y cannot be accurately obtained through calculation. Only the moment of the distribution of y can be estimated, that is, the value of α. Represents the cross entropy of y:
Figure PCTCN2018096157-appb-000017
Figure PCTCN2018096157-appb-000017
式(2)中,α i是第i个默认边界框的实际输出,y i是第i个默认边界框的期望输出。 In Equation (2), α i is the actual output of the i-th default bounding box, and y i is the expected output of the i-th default bounding box.
因此,n个默认边界框的平均交叉熵如下:Therefore, the average cross entropy of the n default bounding boxes is as follows:
Figure PCTCN2018096157-appb-000018
Figure PCTCN2018096157-appb-000018
式(3)中,y i,n表示当相匹配的默认边界框的数量为n时,第i个默认边界框的期望输出;α i,n表示当相匹配的默认边界框的数量为n时,第i个默认边界框的实际输出。 In formula (3), y i, n represents the expected output of the ith default bounding box when the number of matching default bounding boxes is n; α i, n represents the number of matching default bounding boxes as n , The actual output of the i-th default bounding box.
进一步地,根据本发明一实施例,针对特定场景,如海洋平台这个特殊场景,由于人员目标的安全服颜色与某些柱形管道颜色和形态一致,现有技术使用简单场景下常规模型会经常混淆两者,导致较高的误报率。本发明该实施例中对这两种目标进行SVM后验判别,将大量人工标注的图像数据集送入SVM预先训练好人员目标和柱形管道目标的分类器中,在置信度判别后进行本地SVM二分类再判别,将识别出的柱形管道视为负样本去除,只在正样本人员类别的置信度基础上进行评分预估确实是否为真实人员目标,减少了负样本的计算量。通过双重判别的总体目标损失函数是置信度损失和本地化评分损失的加权平均和,该总体目标损失函数具体如下:Further, according to an embodiment of the present invention, for a specific scene, such as a special scene of an offshore platform, since the color of the safety clothing of the human target is consistent with the color and shape of some cylindrical pipes, the conventional technology often uses conventional models in simple scenes. Confusion between the two leads to a higher false positive rate. In this embodiment of the present invention, the two types of targets are subjected to SVM post-judgment discrimination. A large number of manually labeled image data sets are sent to the SVM classifier in which the human target and the cylindrical pipe target are trained in advance, and the locality is performed after the confidence determination. The SVM two-class classification is then used to discriminate the identified cylindrical pipelines as negative samples. Only the confidence level of the positive sample person category is used to estimate whether the score is indeed a real person target, reducing the amount of calculation of negative samples. The overall objective loss function through double discrimination is the weighted average sum of the confidence loss and the localized scoring loss. The overall objective loss function is as follows:
Figure PCTCN2018096157-appb-000019
Figure PCTCN2018096157-appb-000019
式(4)中,δ为初始权重项。In Equation (4), δ is the initial weight term.
进一步,通过交叉验证将初始权重项δ设置为1,当以置信度评价期望输出时,输出为每一类的置信度C,则置信度的损失函数L(α,c)如下:Further, the initial weighting term δ is set to 1 through cross-validation. When the expected output is evaluated with confidence, the output is the confidence C of each class, and the loss function L (α, c) of the confidence is as follows:
Figure PCTCN2018096157-appb-000020
Figure PCTCN2018096157-appb-000020
式(5)中,y i,N表示当相匹配的默认边界框的数量为N时,第i个默认边界框的期望输出;α i,N表示当相匹配的默认边界框的数量为N时,第i个默认边界框的实际输出;N是与实际边界框相匹配默认边界框的数量;若N=0,则将置信度损失设置为0。设
Figure PCTCN2018096157-appb-000021
表示第i个默认边界框与类别p的第j个实际边界框相匹配,否则,若不匹配则
Figure PCTCN2018096157-appb-000022
本地化评分损失函数为:
In formula (5), y i, N indicates the expected output of the i-th default bounding box when the number of matching default bounding boxes is N; α i, N indicates that when the number of matching default bounding boxes is N , The actual output of the i-th default bounding box; N is the number of default bounding boxes that match the actual bounding box; if N = 0, the confidence loss is set to 0. Assume
Figure PCTCN2018096157-appb-000021
Indicates that the i-th default bounding box matches the j-th actual bounding box of the category p, otherwise, if it does not match, then
Figure PCTCN2018096157-appb-000022
The localization scoring loss function is:
Figure PCTCN2018096157-appb-000023
Figure PCTCN2018096157-appb-000023
式(6)中,
Figure PCTCN2018096157-appb-000024
表示默认边界框与实际边界框相匹配的评分;f j表示默认边界框的预设评分,
Figure PCTCN2018096157-appb-000025
表示第α i个默认边界框的实际评分;Δ表示间隔。
In formula (6),
Figure PCTCN2018096157-appb-000024
Represents the score that the default bounding box matches the actual bounding box; f j represents the preset score of the default bounding box,
Figure PCTCN2018096157-appb-000025
Represents the actual score of the α i default bounding box; Δ represents the interval.
总体目标损失函数是让本地化评分损失函数在一个渐进过程中找到一个全局极小值,使得评分差异最小以及预测值更加准确,使得对目标包围盒进行调整以更好地匹配目标对象形状。The overall target loss function is to let the localized scoring loss function find a global minimum value in a gradual process, so that the difference in score is minimum and the prediction value is more accurate, so that the target bounding box is adjusted to better match the target object shape.
如图4所示,为本发明的人体关键点检测算法的流程图,人体关键点检测算法流程包括:As shown in FIG. 4, it is a flowchart of a human keypoint detection algorithm according to the present invention. The human keypoint detection algorithm flow includes:
(S410)将目标检测得到的离散化人员目标包围盒坐标作为算法的初始输入,经过一系列卷积操作提取特征得到特征图;(S410) Using the coordinates of the discretized human target bounding box obtained by target detection as the initial input of the algorithm, extracting features through a series of convolution operations to obtain a feature map;
(S411)身体部位定位和关联程度分析是在两个分支上同时进行,前者是为了求得所有的关键点,包括:头部、颈部、右肩、右肘、右手腕、左肩、左肘、左手腕、右臀、右膝、右脚踝、左臀、左膝和左脚踝等14个关键点;后者是为了求得所有部位之间的关联程度以建立相对位置关系;(S411) Body part positioning and correlation analysis are performed simultaneously on two branches. The former is to find all key points, including: head, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow , Left wrist, right hip, right knee, right ankle, left hip, left knee, and left ankle; 14 key points; the latter is to find the degree of correlation between all parts to establish a relative position relationship;
(S412)身体部位定位算法由一系列预测器组成,分成多个阶段,每个阶段为人体每个部位重复生成置信图,每一张置信图包含某一种关键点,该置信图与原始图像特征同时用作下一阶段的输入,预测各部位的位置,进而确定人体各关键点的位置;(S412) The body part localization algorithm consists of a series of predictors, which are divided into multiple stages. Each stage repeatedly generates a confidence map for each part of the human body. Each confidence map contains a certain type of key point. The confidence map is related to the original image. Features are also used as inputs in the next stage to predict the positions of various parts, and then determine the positions of key points of the human body;
(S413)对人体部位的位置和方向进行编码,通过在二维矢量场中矢量的方向判别多人关键点的从属问题;(S413) encode the position and direction of the human body part, and determine the subordinate problem of the key points of multiple people through the direction of the vector in the two-dimensional vector field;
(S414)利用矢量之间的位移长度建立人体各部位之间的相对位置关系,从而实现人体不可见关键点的预测与估计,最终得到人体所有关键点的详细信息。(S414) Use the displacement length between the vectors to establish the relative positional relationship between various parts of the human body, so as to realize the prediction and estimation of invisible key points of the human body, and finally obtain detailed information of all key points of the human body.
在步骤S412中,对每个部位累加所有尺度下的置信图,得到总置信图,找出置信度最大的点,该点为相应的关键点的位置。In step S412, the confidence maps at all scales are accumulated for each part to obtain a total confidence map, and the point with the highest degree of confidence is found, which is the position of the corresponding key point.
人体关键点检测算法在每一个尺度下对输入图像进行特征提取,获得人体各个部位的置信图,置信值越大在置信图上的颜色就越深,颜色的深度在整个置信图中是相对的。本发明的算法用各部位的置信图来表达各部位之间的空间约束,同时在多个尺度下处理输入的特征图和响应图,既能确保精度,又考虑了各个部位之间的距离关系,通过不断扩大网络的接受域来检测其他部位位置,最终实现人体所有关键点的精准检测。The human keypoint detection algorithm performs feature extraction on the input image at each scale to obtain the confidence map of each part of the human body. The larger the confidence value, the deeper the color on the confidence map, and the depth of the color is relative in the entire confidence map. . The algorithm of the present invention uses the confidence map of each part to express the space constraints between the parts, and simultaneously processes the input feature map and response map under multiple scales, which can ensure accuracy and take into account the distance relationship between each part By continuously expanding the network's acceptance domain to detect the location of other parts, eventually accurate detection of all key points of the human body is achieved.
具体地,根据本发明一实施例,为了避免通过目标检测得到的人员目标包围盒在一定范围内存在部分误差,导致人员目标的部位可能未能完整地显 示在包围盒内的问题,本发明该实施例采取多尺度的方式扩大感知野,减小目标检测带来的误差。具体地,按照1.0:1.2的比例对原包围盒进行扩大,通过这种方式得到完整的人员目标,以便在人体关键点检测阶段检测到所有的关键点坐标。再利用卷积网络进行特征提取后,从原始图像直接预测每个部位的置信值,从而产生对应的置信图,其中包含一个背景置信图。将人体共分为P个模型部位,则共有P+1层置信图,P的预设值为14。假设x是图像中具有突出特征的像素,将原图输入网络,通过卷积操作提取图像中的突出特征,突出特征主要是指纹理特征。使用C 1表示第一个阶段的分类器,分类器可以粗略预测各部位的位置,从而产生各部位的置信图。分类器结构如下: Specifically, according to an embodiment of the present invention, in order to avoid the problem that the human target bounding box obtained by the target detection has a partial error in a certain range, which causes the problem that the part of the human target may not be completely displayed in the bounding box, the present invention The embodiment adopts a multi-scale method to expand the perceptual field and reduce errors caused by target detection. Specifically, the original bounding box is enlarged according to a ratio of 1.0: 1.2, and in this way, a complete human target is obtained, so that all key point coordinates are detected during the human key point detection stage. After using the convolutional network for feature extraction, the confidence value of each part is directly predicted from the original image, thereby generating a corresponding confidence map, which includes a background confidence map. When the human body is divided into P model parts, there are P + 1 layers of confidence maps. The preset value of P is 14. Suppose x is a pixel with salient features in the image. The original image is input into the network and the salient features in the image are extracted through a convolution operation. The salient features are mainly texture features. Using C 1 to represent the classifier in the first stage, the classifier can roughly predict the position of each part, thereby generating a confidence map of each part. The classifier structure is as follows:
Figure PCTCN2018096157-appb-000026
Figure PCTCN2018096157-appb-000026
其中,
Figure PCTCN2018096157-appb-000027
表示图像的像素空间,x i是图像中的每个像素的位置,p表示一个具体模型部位,
Figure PCTCN2018096157-appb-000028
表示第一阶段中部位p的置信值。
among them,
Figure PCTCN2018096157-appb-000027
Represents the pixel space of the image, x i is the position of each pixel in the image, and p represents a specific model part,
Figure PCTCN2018096157-appb-000028
Represents the confidence value of location p in the first stage.
将第一阶段得到的置信图与图像特征作为第二阶段的输入数据,同时将原始图像再次作为输入,随着网络的接受域不断扩大,学习到的特征也会与前一阶段有所不同,所使用的特征函数包括图像数据特征、该阶段各各部位的置信图以及各级分类器的上下文信息。分类器C 2继续预测各部位的位置,是对前一阶段预测位置的修正,总体目标F(t)如下所示: The confidence map and image features obtained in the first stage are used as input data in the second stage, and the original image is used as input again. As the acceptance domain of the network continues to expand, the learned features will be different from the previous stage. The feature functions used include the image data features, the confidence map of each part at this stage, and the context information of the classifiers at all levels. The classifier C 2 continues to predict the position of each part, which is a modification of the predicted position in the previous stage. The overall target F (t) is as follows:
Figure PCTCN2018096157-appb-000029
Figure PCTCN2018096157-appb-000029
式(7)中,
Figure PCTCN2018096157-appb-000030
表示理想置信度在t∈T阶段取得。通过对两个阶段的不断迭代,使得预测部位位置更加精确,最终得到每个部位的较为精确的位置。
In equation (7),
Figure PCTCN2018096157-appb-000030
Denotes that the ideal confidence is obtained at t ∈ T. Through the continuous iteration of the two stages, the location of the predicted part is made more accurate, and finally a more accurate position of each part is obtained.
进一步需要知道的,对于多帧光流图可通过光流法设定光流阈值,提取出视频中有效的运动区域,筛选出带有人员目标的视频片段用以单帧图像转换。为了产生随机的提取帧,设定每隔24帧进行一次哈希函数计算,每次选择一个随机函数random,取每帧所在的帧编号为它的哈希地址,得到随机生成的帧编号,即为提取帧。It is further necessary to know that, for multi-frame optical flow maps, the optical flow threshold can be set by the optical flow method, the effective motion areas in the video are extracted, and the video fragments with human targets are screened for single frame image conversion. In order to generate a random extraction frame, set a hash function calculation every 24 frames, select a random function random each time, take the frame number where each frame is located as its hash address, and get a randomly generated frame number, that is, For extracting frames.
对于多帧光流图的约束方程,设定目标移动距离足够小,同时移动所需的时间也可以忽略不计,那么通过泰勒公式对多帧光流图的约束方程进行变 换,如下所示:For the constraint equation of the multi-frame optical flow diagram, set the target moving distance to be small enough, and at the same time, the time required for the movement can be ignored, then the constraint equation of the multi-frame optical flow diagram is transformed by Taylor formula, as shown below:
I x×V x+I y×V y+I z×V z=-It    (8) I x × V x + I y × V y + I z × V z = -It (8)
式(8)中,I x,I y,I z,I t分别为I(x,y,z,t)在x,y,z,t处的分量,V x,V y,V z分别是I(x,y,z,t)的光流向量中x,y,z的组成,三个偏微分则是图像在x,y,z,t这一像素点上相应方向的差分来近似。 In formula (8), I x , I y , I z , and I t are components of I (x, y, z, t) at x, y, z, and t, and V x , V y , and V z are respectively Is the composition of x, y, and z in the optical flow vector of I (x, y, z, and t). The three partial differentials are approximated by the corresponding direction difference of the image at the pixel of x, y, z, and t. .
二维矢量场的形成方法,具体地为:通过在时间t上进行连续提取多帧得到光流图,给图像中的每个像素点赋予一个速度矢量形成一个运动矢量场,通过预处理操作得到连续帧之间的光流位移堆叠场而形成的二维矢量场。A method for forming a two-dimensional vector field is specifically: obtaining an optical flow map by continuously extracting multiple frames at time t, and assigning a velocity vector to each pixel in the image to form a motion vector field, which is obtained through a preprocessing operation A two-dimensional vector field formed by optical flow displacement between successive frames.
进一步地,根据本发明一实施例,对于多人关键点检测的问题,检测不同人的身体部位,还需要将每个人的身体分别组合在一起,形成一个完整的人体,使用的方法就是二维矢量场。它是一个2D向量集合,每一个2D向量集合都会编码一个人体部位的位置和方向,将位置和方向信息存储在向量中,每一个向量都会在关联的两个人体部位之间有一个亲和区域,其中的每一个像素都有一个2D向量的描述方向。亲和区通过响应图的方式存在,维度是二维的。若某个点有多人重叠,则将n个人的向量求和,再除以人数。Further, according to an embodiment of the present invention, for the problem of detecting key points of multiple people, detecting the body parts of different people, it is necessary to combine the bodies of each person separately to form a complete human body. The method used is two-dimensional Vector field. It is a 2D vector set. Each 2D vector set encodes the position and orientation of a human body part. The position and orientation information is stored in the vector. Each vector will have an affinity area between the two related human body parts. Each of these pixels has a 2D vector describing the direction. The affinity region exists in the form of a response graph, and the dimensions are two-dimensional. If multiple people overlap at a certain point, sum the vectors of n people and divide by the number of people.
在视频中设定检测区域,在复杂场景下通过本发明的方法进行目标检测,对人员目标进行定位、识别和跟踪,对同一人员目标在该区域内运动超过一定时间的事件进行徘徊检测,可用于银行、政府机关、大使馆、文化与宗教聚集地、高安全周界、商业区和住宅区等场所的智能监控,发现可疑目标并及时发出警告,以排除安全隐患。The detection area is set in the video, the target detection is performed by the method of the present invention in a complex scene, the human target is located, identified and tracked, and the hover detection is performed on the event that the same human target moves within the area for a certain time, available Intelligent monitoring in banks, government agencies, embassies, cultural and religious gathering places, high-security perimeters, commercial and residential areas, etc., find suspicious targets and issue timely warnings to eliminate potential security risks.
而且,本发明的方法通过精确分析和定位人体的关键点,在此基础上能够进行人员的行为和姿态判别,可应用于石油、工业、医疗和安保等多个领域,这些领域面临着诸多安全隐患因素,例如石油钻采生产作业的人员不慎坠海、工业生产人员不符合规定佩戴安全设备以及老人、病人摔倒等情况。本发明的方法可以减少人工干预的时间,避免了因人身意外和违规操作生产造成的经济损失,从而保障了工业的安全生产,节省了人力物力,提高了生产管理水平。Moreover, the method of the present invention can accurately identify and locate the key points of the human body, and on this basis, can judge the behavior and posture of personnel. It can be applied to many fields such as petroleum, industry, medical treatment and security, and these fields are facing many safety issues. Hidden danger factors, such as accidentally falling into the sea for oil drilling and production operations, industrial production personnel failing to wear safety equipment in compliance with regulations, and elderly and patient falls. The method of the invention can reduce the time of manual intervention, avoid economic losses caused by personal accidents and illegal operation of production, thereby ensuring the safety of industrial production, saving manpower and material resources, and improving the level of production management.
一种基于复杂场景下的人体关键点检测系统,如图5所示,为本发明的基于复杂场景下的人体关键点检测系统的结构图,该系统包含:数据预处理模块,其对监控视频信息进行处理,以获得单帧静态图和多帧光流图;人员 目标检测模块,其通过卷积操作提取数据预处理模块发送的单帧静态图的特征,使用小卷积核卷积滤波器预测每个边界框中物体的实际边界框并计算实际置信度,将实际置信度作为目标包围盒,采用SVM后验判别将实际置信度与预设置信度进行判别,以去除无效的包围盒,以修正目标包围盒位置,获得离散化人员目标包围盒;以及人体关键点检测模块,其接收人员目标检测模块发送的离散化人员目标包围盒坐标,通过卷积操作提取特征以得到特征图,并获得部位的关键点和关联程度,利用预测器为人体每个部位生成部位置信图,通过部位置信图和二维矢量场实现人体关键点的精准检测。A human body key point detection system based on a complex scene, as shown in FIG. 5, is a structural diagram of a human body key point detection system based on a complex scene according to the present invention. The system includes a data pre-processing module for monitoring video. The information is processed to obtain a single frame static image and a multi-frame optical flow image; a human target detection module that extracts features of a single frame static image sent by the data preprocessing module through a convolution operation, and uses a small convolution kernel convolution filter Predict the actual bounding box of the object in each bounding box and calculate the actual confidence, and use the actual confidence as the target bounding box. Use SVM posterior discrimination to discriminate the actual confidence from the preset confidence to remove invalid bounding boxes. In order to modify the position of the target bounding box, a discrete human target bounding box is obtained; and the human keypoint detection module receives the discrete human target bounding box coordinates sent by the human target detection module, and extracts features through a convolution operation to obtain a feature map, and Obtain the key points and the degree of correlation of the parts, and use the predictor to generate a position position map for each part of the human body. FIG position signal and the two-dimensional vector fields to achieve precise detection of the key body.
其中,人体关键点检测模块采用若干阶段迭代的方式,将前一阶段获得的置信图与提取的特征作为下一阶段的输入,以在若干阶段之间不断迭代,获得精确的部位置信图。具体地,该人体关键点检测模块采用上述基于复杂场景下的人体关键点检测方法中的人体关键点检测算法的步骤操作。Among them, the human keypoint detection module uses several stages of iteration, and uses the confidence map obtained in the previous stage and the extracted features as the input of the next stage to continuously iterate between several stages to obtain an accurate partial position information map. Specifically, the human keypoint detection module adopts the steps of the human keypoint detection algorithm described above based on the human keypoint detection method in a complex scene.
其中,人员目标检测模块也采用上述基于复杂场景下的人体关键点检测方法中的人员目标检测算法的步骤操作。Among them, the human target detection module also adopts the steps of the human target detection algorithm described above based on the human key point detection method in a complex scene.
综上所述,本发明的基于复杂场景下的人体关键点检测系统及方法在复杂场景下对人员目标的所有关键点进行快速准确的检测,能够应用于多个领域进行定位、识别、跟踪以及行为和姿态判别。To sum up, the human body keypoint detection system and method based on complex scenes of the present invention can quickly and accurately detect all keypoints of human targets in complex scenes, and can be applied to multiple fields for positioning, recognition, tracking, and Behavior and posture discrimination.
尽管本发明的内容已经通过上述优选实施例作了详细介绍,但应当认识到上述的描述不应被认为是对本发明的限制。在本领域技术人员阅读了上述内容后,对于本发明的多种修改和替代都将是显而易见的。因此,本发明的保护范围应由所附的权利要求来限定。Although the content of the present invention has been described in detail through the above preferred embodiments, it should be recognized that the above description should not be considered as limiting the present invention. After reading the above content by those skilled in the art, various modifications and alternatives to the present invention will be apparent. Therefore, the protection scope of the present invention should be defined by the appended claims.

Claims (10)

  1. 一种基于复杂场景下的人体关键点检测方法,其特征在于,该方法包含:A method for detecting key points of a human body based on a complex scene is characterized in that the method includes:
    (S100)输入监控视频信息,进行预处理得到单帧静态图和多帧光流图;(S100) Input monitoring video information, and perform preprocessing to obtain single-frame static image and multi-frame optical flow image;
    (S200)对单帧静态图通过卷积操作提取特征以得到特征图,为解决复杂场景下干扰目标对人员目标检测的影响,采用人员目标检测算法,以对特征图的实际置信度与预设置信度进行判别,去除非人员目标,得到离散化人员目标包围盒;(S200) Extract features from a single frame static map by convolution operations to obtain feature maps. In order to solve the impact of interference targets on human target detection in complex scenes, a human target detection algorithm is used to actualize the confidence and preset of the feature map. Discriminate with confidence, remove non-human targets, and obtain discrete human target bounding boxes;
    (S300)对多帧光流图采用光流堆叠来形成二维矢量场;(S300) Use optical flow stacking for multi-frame optical flow diagrams to form a two-dimensional vector field;
    (S400)提取所述的离散化人员目标包围盒中特征,得到特征图,获得部位的关键点和关联程度,利用预测器为人体每个部位生成部位置信图,通过部位置信图和二维矢量场实现人体关键点的精准检测;(S400) Extract the features in the discretized human target bounding box to obtain a feature map, obtain the key points and the degree of association of the part, use a predictor to generate a partial position information map for each part of the human body, and use the partial position information map and two The dimensional vector field enables accurate detection of key points of the human body;
    在所述的步骤S400中,在第一阶段,扩大离散化人员目标包围盒,以原始图像作为输入,采用卷积操作提取特征后,从原始图像通过分类器预测每个部位的置信值,产生对应的置信图,且将前一阶段获得的置信图与提取的特征作为下一阶段的输入,在若干阶段之间不断迭代,以获得精确的部位置信图。In the step S400, in the first stage, the discretized human target bounding box is enlarged, and the original image is used as an input. After the feature is extracted by using a convolution operation, the confidence value of each part is predicted by the classifier from the original image to generate Corresponding confidence map, and using the confidence map obtained in the previous stage and the extracted features as the input of the next stage, iterating continuously between several stages to obtain an accurate partial location confidence map.
  2. 根据权利要求1所述的基于复杂场景下的人体关键点检测方法,其特征在于,所述的人员目标检测算法包括:The method for detecting key points of a human body based on a complex scene according to claim 1, wherein the human target detection algorithm comprises:
    (S210)对不同尺寸的单帧静态图产生一组固定大小的默认边界框集合,对该组默认边界框内的区域进行特征提取;(S210) Generate a set of fixed-size default bounding box sets for single-frame still images of different sizes, and perform feature extraction on the regions within the set of default bounding boxes;
    (S211)对人员目标的形体表征,提取主要特征,以形成不同层次的特征图单元,作为图像数据集,将每个层次的特征图单元以卷积的方式平铺特征映射,使得每个默认边界框与相对应的特征图单元的位置固定;(S211) For the physical characterization of the human target, extract the main features to form feature map units at different levels. As an image data set, the feature map units at each level are tiled with feature maps in a convolutional manner so that each default The position of the bounding box and the corresponding feature map unit is fixed;
    (S212)在所述的每个特征图单元上使用小卷积核卷积滤波器预测每个默认边界框中物体的实际边界框,该实际边界框作为目标包围盒,并计算出实际置信度,将实际置信度与预设置信度进行判别,以去除无效的包围盒,以修正目标包围盒位置;(S212) On each feature map unit, a small convolution kernel convolution filter is used to predict the actual bounding box of the object in each default bounding box, the actual bounding box is used as the target bounding box, and the actual confidence degree is calculated , Judging the actual confidence level and the preset reliability level to remove invalid bounding boxes and to correct the target bounding box position;
    (S213)输出在不同层次上的离散化目标包围盒,其具有不同的长宽比尺度。(S213) Output discrete target bounding boxes at different levels, which have different aspect ratio scales.
  3. 根据权利要求2所述的基于复杂场景下的人体关键点检测方法,其特征在于,在所述的步骤S212中,在进行置信度判别过程中,需要计算出每个默认边界框与相对应的实际边界框的误差和相应的评分,以预测默认边界框区域内的所有目标的类别和置信度;The method for detecting key points of a human body based on a complex scene according to claim 2, characterized in that in said step S212, in performing the confidence discrimination process, it is necessary to calculate each default bounding box and the corresponding The actual bounding box error and corresponding scores to predict the categories and confidence of all targets in the default bounding box area;
    设定所述的预设置信度的阈值;当所述的实际置信度大于该阈值时,进行模型损失计算;当所述的实际置信度小于该阈值时,进行SVM后验判别;当判别为人员目标时,则微调目标包围盒;当判别为非人员目标时,剔除无效的包围盒。Set the threshold of the preset reliability; when the actual confidence is greater than the threshold, perform model loss calculation; when the actual confidence is less than the threshold, perform SVM posterior determination; when the determination is For human targets, fine-tune the target bounding box; when it is judged to be a non-human target, remove invalid bounding boxes.
  4. 根据权利要求3所述的基于复杂场景下的人体关键点检测方法,其特征在于,所述的模型损失计算通过损失函数完成,损失函数为:The method for detecting key points of a human body based on a complex scene according to claim 3, wherein the model loss calculation is completed by a loss function, and the loss function is:
    Figure PCTCN2018096157-appb-100001
    Figure PCTCN2018096157-appb-100001
    式(1)中,L(e)是损失误差,y是期望输出,α为实际输出;In formula (1), L (e) is the loss error, y is the expected output, and α is the actual output;
    对y的分布进行矩估计,用α来表示y的交叉熵为:Perform moment estimation on the distribution of y, and use α to represent the cross entropy of y as:
    Figure PCTCN2018096157-appb-100002
    Figure PCTCN2018096157-appb-100002
    式(2)中,α i是第i个默认边界框的实际输出,y i是第i个默认边界框的期望输出; In Equation (2), α i is the actual output of the ith default bounding box, and y i is the expected output of the ith default bounding box;
    n个默认边界框的平均交叉熵为:The average cross entropy of the n default bounding boxes is:
    Figure PCTCN2018096157-appb-100003
    Figure PCTCN2018096157-appb-100003
    式(3)中,y i,n表示当相匹配的默认边界框的数量为n时,第i个默认边界框的期望输出;α i,n表示当相匹配的默认边界框的数量为n时,第i个默认边界框的实际输出。 In formula (3), y i, n represents the expected output of the i-th default bounding box when the number of matching default bounding boxes is n; α i, n represents the number of matching default bounding boxes as n , The actual output of the i-th default bounding box.
  5. 根据权利要求4所述的基于复杂场景下的人体关键点检测方法,其特征在于,在所述的步骤S212中,当存在混淆目标时,对人员目标和混淆目标进行SVM后验判别,将大量人工标注的图像数据集送入SVM预先训练好人员目标和混淆目标的分类器中,在置信度判别后进行本地SVM二分 类再判别,将识别出的混淆目标作为负样本去除,人员目标作为正样本,在正样本人员类别的置信度基础上,进行评分确定是否为真实的人员目标。The method for detecting key points of a human body based on a complex scene according to claim 4, characterized in that, in step S212, when there is a confusing target, the SVM post-judgment is performed on the human target and the confusing target, and a large number of The manually labeled image data set is sent to the SVM classifier with pre-trained human targets and confused targets. After the confidence level is determined, the local SVM is classified again, and the identified confused targets are removed as negative samples, and the human targets are regarded as positive. The sample is scored to determine whether it is a real person target based on the confidence of the positive sample person category.
  6. 根据权利要求5所述的基于复杂场景下的人体关键点检测方法,其特征在于,双重判别的总体目标损失函数是置信度损失和本地化评分损失的加权平均和,该总体目标损失函数为:The method for detecting human key points based on a complex scene according to claim 5, wherein the overall target loss function of the double discrimination is a weighted average sum of the confidence loss and the localization score loss, and the overall target loss function is:
    Figure PCTCN2018096157-appb-100004
    Figure PCTCN2018096157-appb-100004
    式(4)中,δ为初始权重项;N是与实际边界框相匹配的默认边界框的数量;L(α,c)为置信度的损失函数;L(α,f)为本地化评分损失函数;In Equation (4), δ is the initial weight term; N is the number of default bounding boxes that match the actual bounding box; L (α, c) is the loss function of confidence; L (α, f) is the localization score Loss function
    通过交叉验证将所述的初始权重项δ设置为1;当以置信度评价期望输出时,输出为每一类的置信度C,则置信度的损失函数L(α,c)为:The cross-validation sets the initial weighting term δ to 1; when the expected output is evaluated with confidence, the output is the confidence C of each class, and the loss function L (α, c) of the confidence is:
    Figure PCTCN2018096157-appb-100005
    Figure PCTCN2018096157-appb-100005
    式(5)中,y i,N表示当相匹配的默认边界框的数量为N时,第i个默认边界框的期望输出;α i,N表示当相匹配的默认边界框的数量为N时,第i个默认边界框的实际输出; In formula (5), y i, N indicates the expected output of the i-th default bounding box when the number of matching default bounding boxes is N; α i, N indicates that when the number of matching default bounding boxes is N , The actual output of the i-th default bounding box;
    当N=0时,所述的置信度损失为0;When N = 0, the confidence loss is 0;
    Figure PCTCN2018096157-appb-100006
    时,表示第i个默认边界框与类别p的第j个实际边界框相匹配;
    when
    Figure PCTCN2018096157-appb-100006
    , It means that the i-th default bounding box matches the j-th actual bounding box of the category p;
    Figure PCTCN2018096157-appb-100007
    时,表示第i个默认边界框与类别p的第j个实际边界框不匹配,本地化评分损失函数为:
    when
    Figure PCTCN2018096157-appb-100007
    , It means that the i-th default bounding box does not match the j-th actual bounding box of the category p, and the localized scoring loss function is:
    Figure PCTCN2018096157-appb-100008
    Figure PCTCN2018096157-appb-100008
    式(6)中,
    Figure PCTCN2018096157-appb-100009
    表示默认边界框与实际边界框相匹配的评分;f j表示默认边界框的预设评分,
    Figure PCTCN2018096157-appb-100010
    表示第α i个默认边界框的实际评分;Δ表示间隔。
    In formula (6),
    Figure PCTCN2018096157-appb-100009
    Represents the score that the default bounding box matches the actual bounding box; f j represents the preset score of the default bounding box,
    Figure PCTCN2018096157-appb-100010
    Represents the actual score of the α i default bounding box; Δ represents the interval.
  7. 根据权利要求1-6中任意一项所述的基于复杂场景下的人体关键点检测 方法,其特征在于,所述的第一个阶段的分类器C 1的结构为: The method for detecting key points of a human body based on a complex scene according to any one of claims 1-6, wherein the structure of the classifier C 1 in the first stage is:
    Figure PCTCN2018096157-appb-100011
    Figure PCTCN2018096157-appb-100011
    其中,
    Figure PCTCN2018096157-appb-100012
    表示图像的像素空间,x i表示图像中每个像素的位置,p表示具体模型部位,
    Figure PCTCN2018096157-appb-100013
    表示第一阶段中部位p的置信值;
    among them,
    Figure PCTCN2018096157-appb-100012
    Represents the pixel space of the image, x i represents the position of each pixel in the image, p represents the specific model part,
    Figure PCTCN2018096157-appb-100013
    Represents the confidence value of the position p in the first stage;
    通过将前一阶段获得的置信图与提取的特征作为下一阶段的数据输入,以对前一阶段的位置进行修正,总体目标F(t)为:By using the confidence map and extracted features obtained in the previous stage as data input in the next stage to modify the position in the previous stage, the overall objective F (t) is:
    Figure PCTCN2018096157-appb-100014
    Figure PCTCN2018096157-appb-100014
    式(7)中,
    Figure PCTCN2018096157-appb-100015
    表示理想置信度在t∈T阶段取得。
    In equation (7),
    Figure PCTCN2018096157-appb-100015
    Denotes that the ideal confidence is obtained at t ∈ T.
  8. 根据权利要求7所述的基于复杂场景下的人体关键点检测方法,其特征在于,在所述的步骤S300中,对所述的多帧光流图通过光流法设定光流阈值,提取出视频中有效运动区域,筛选出带有人员目标的视频片段以转换为单帧图像,并且设定每经任意一间隔帧进行哈希函数计算,选择一个随机函数random,取每帧所在的帧编号为其哈希地址,得到随机生成的帧编号为提取帧;The method for detecting key points of a human body based on a complex scene according to claim 7, characterized in that in said step S300, an optical flow threshold is set and extracted by using an optical flow method for said multi-frame optical flow map. The effective motion area in the video is selected, and the video segment with the human target is filtered to be converted into a single frame image, and a hash function calculation is performed every arbitrary interval frame. A random function random is selected, and the frame where each frame is located is taken. The number is its hash address, and the randomly generated frame number is the extracted frame;
    通过泰勒公式将所述的多帧光流图的约束方程转换为:The constraint equation of the multi-frame optical flow diagram is converted into:
    I x×V x+I y×V y+I z×V z=-It         (8) I x × V x + I y × V y + I z × V z = -It (8)
    式(8)中,I x,I y,I z,I t分别为I(x,y,z,t)在x,y,z,t处的分量,V x,V y,V z分别是I(x,y,z,t)的光流向量中x,y,z的组成,I(x,y,z,t)为在(x,y,z)位置的体素; In formula (8), I x , I y , I z , and I t are components of I (x, y, z, t) at x, y, z, and t, and V x , V y , and V z are respectively Is the composition of x, y, z in the optical flow vector of I (x, y, z, t), and I (x, y, z, t) is the voxel at (x, y, z) position;
    所述的二维矢量场的形成方法包含:通过在时间t上进行连续提取多帧得到光流图,给图像中的每个像素点赋予一个速度矢量形成一个运动矢量场,通过预处理操作得到连续帧之间的光流位移堆叠场,以形成二维矢量场。The method for forming a two-dimensional vector field includes: successively extracting multiple frames at time t to obtain an optical flow map, and assigning a velocity vector to each pixel in the image to form a motion vector field, which is obtained through a preprocessing operation; Optical flow between successive frames shifts the stacked fields to form a two-dimensional vector field.
  9. 根据权利要求8所述的基于复杂场景下的人体关键点检测方法,其特征在于,所述的人体关键点检测算法流程包括:The method for detecting a human keypoint based on a complex scene according to claim 8, wherein the human keypoint detection algorithm flow comprises:
    (S410)将目标检测得到的离散化人员目标包围盒坐标作为算法的初始输入,经过卷积操作提取特征得到特征图;(S410) Using the coordinates of the discretized human target bounding box obtained by the target detection as the initial input of the algorithm, extracting features through a convolution operation to obtain a feature map;
    (S411)身体部位定位和关联程度分析在两个分支上同时进行,通过身体部位定位求得所有的关键点,通过关联程度分析求得所有部位之 间的关联程度,以建立相对位置关系;(S411) Body part positioning and degree of association analysis are performed simultaneously on two branches. All key points are obtained through body part positioning, and degree of association between all parts is obtained through association degree analysis to establish a relative position relationship;
    (S412)所述的身体部位定位的算法由预测器组成,分成若干阶段,每个阶段为人体每个部位重复生成置信图,每张置信图包含某一种关键点,该置信图与原始图像特征同时作为下一阶段的输入,预测各部位的位置,进而确定人体各关键点的位置;The body part positioning algorithm described in (S412) consists of a predictor. It is divided into several stages. Each stage repeatedly generates a confidence map for each part of the human body. Each confidence map contains a certain kind of key points. The confidence map and the original image Features are also used as inputs in the next stage to predict the positions of various parts, and then determine the positions of key points of the human body;
    (S413)对人体部位的位置和方向进行编码,通过在所述的二维矢量场中矢量的方向判别多人关键点的从属问题;(S413) encode the position and direction of the human body part, and determine the subordinate problem of the key points of multiple people through the direction of the vector in the two-dimensional vector field;
    (S414)利用矢量之间的位移长度建立人体各部位之间的相对位置关系,实现人体不可见关键点的预测与估计,得到人体所有关键点的详细信息;(S414) Use the displacement length between the vectors to establish the relative position relationship between various parts of the human body, realize the prediction and estimation of invisible key points of the human body, and obtain detailed information of all key points of the human body;
    在所述的步骤S412中,对每个部位累加所有尺度下的置信图,得到总置信图,找出置信度最大的点,该点为相应的关键点的位置;In step S412, the confidence maps at all scales are accumulated for each part to obtain a total confidence map, and the point with the highest degree of confidence is found, which is the position of the corresponding key point;
    对于多人关键点检测,通过二维矢量场将每个人的身体组合在一起,形成一个完整的人体;当某个点有多人重叠时,将n个人的向量求和,再除以人数。For multi-person keypoint detection, each person's body is combined together through a two-dimensional vector field to form a complete human body; when multiple people overlap at a certain point, the vectors of n people are summed and divided by the number of people.
  10. 一种基于复杂场景下的人体关键点检测系统,其特征在于,该系统包含:A human body keypoint detection system based on a complex scene is characterized in that the system includes:
    数据预处理模块,其对监控视频信息进行处理,以获得单帧静态图和多帧光流图;Data pre-processing module, which processes the surveillance video information to obtain single-frame still images and multi-frame optical flow images;
    人员目标检测模块,其通过卷积操作提取所述的数据预处理模块发送的单帧静态图的特征,使用小卷积核卷积滤波器预测每个边界框中物体的实际边界框并计算实际置信度,将实际边界框作为目标包围盒,采用SVM后验判别将实际置信度与预设置信度进行判别,以去除无效的包围盒,以修正目标包围盒位置,获得离散化人员目标包围盒;以及A human target detection module that extracts the features of a single frame static map sent by the data preprocessing module through a convolution operation, uses a small convolution kernel convolution filter to predict the actual bounding box of the object in each bounding box, and calculates the actual Confidence, using the actual bounding box as the target bounding box, using SVM posterior discrimination to discriminate the actual confidence from the pre-set reliability to remove the invalid bounding box, modify the position of the target bounding box, and obtain a discrete person target bounding box ;as well as
    人体关键点检测模块,其接收所述的人员目标检测模块发送的离散化人员目标包围盒坐标,通过卷积操作提取特征以得到特征图,并获得部位的关键点和关联程度,利用预测器为人体每个部位生成部位置信图,通过部位置信图和二维矢量场实现人体关键点的精准检测;The human key point detection module receives the discrete human target bounding box coordinates sent by the human target detection module, extracts features through a convolution operation to obtain a feature map, and obtains the key points and the degree of association of the part. The predictor is used as Generate partial position information maps for each part of the human body, and realize accurate detection of key points of the human body through the partial position information maps and two-dimensional vector fields;
    其中,所述的人体关键点检测模块采用若干阶段迭代的方式,将前一阶段获得的置信图与提取的特征作为下一阶段的输入,以在若干阶段之间不断迭代,获得精确的部位置信图。Wherein, the human keypoint detection module adopts several stages of iteration, and uses the confidence map and extracted features obtained in the previous stage as the input of the next stage to continuously iterate between several stages to obtain accurate part positions Letter illustration.
PCT/CN2018/096157 2018-06-05 2018-07-18 Complex scene-based human body key point detection system and method WO2019232894A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810582712.7 2018-06-05
CN201810582712.7A CN108710868B (en) 2018-06-05 2018-06-05 Human body key point detection system and method based on complex scene

Publications (1)

Publication Number Publication Date
WO2019232894A1 true WO2019232894A1 (en) 2019-12-12

Family

ID=63872233

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/096157 WO2019232894A1 (en) 2018-06-05 2018-07-18 Complex scene-based human body key point detection system and method

Country Status (2)

Country Link
CN (1) CN108710868B (en)
WO (1) WO2019232894A1 (en)

Cited By (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991388A (en) * 2019-12-16 2020-04-10 安徽小眯当家信息技术有限公司 Method for calculating character illumination view azimuth correction angle
CN111008631A (en) * 2019-12-20 2020-04-14 浙江大华技术股份有限公司 Image association method and device, storage medium and electronic device
CN111259790A (en) * 2020-01-15 2020-06-09 上海交通大学 Coarse-to-fine behavior rapid detection and classification method and system for medium-short time video
CN111259822A (en) * 2020-01-19 2020-06-09 杭州微洱网络科技有限公司 Method for detecting key point of special neck in E-commerce image
CN111368685A (en) * 2020-02-27 2020-07-03 北京字节跳动网络技术有限公司 Key point identification method and device, readable medium and electronic equipment
CN111369539A (en) * 2020-03-06 2020-07-03 浙江大学 Building facade window detecting system based on multi-feature map fusion
CN111402414A (en) * 2020-03-10 2020-07-10 北京京东叁佰陆拾度电子商务有限公司 Point cloud map construction method, device, equipment and storage medium
CN111428664A (en) * 2020-03-30 2020-07-17 厦门瑞为信息技术有限公司 Real-time multi-person posture estimation method based on artificial intelligence deep learning technology for computer vision
CN111444828A (en) * 2020-03-25 2020-07-24 腾讯科技(深圳)有限公司 Model training method, target detection method, device and storage medium
CN111508019A (en) * 2020-03-11 2020-08-07 上海商汤智能科技有限公司 Target detection method, training method of model thereof, and related device and equipment
CN111524062A (en) * 2020-04-22 2020-08-11 北京百度网讯科技有限公司 Image generation method and device
CN111597974A (en) * 2020-05-14 2020-08-28 哈工大机器人(合肥)国际创新研究院 Monitoring method and system based on TOF camera for personnel activities in carriage
CN111667535A (en) * 2020-06-04 2020-09-15 电子科技大学 Six-degree-of-freedom pose estimation method for occlusion scene
CN111709336A (en) * 2020-06-08 2020-09-25 杭州像素元科技有限公司 Highway pedestrian detection method and device and readable storage medium
CN111832386A (en) * 2020-05-22 2020-10-27 大连锐动科技有限公司 Method and device for estimating human body posture and computer readable medium
CN111832526A (en) * 2020-07-23 2020-10-27 浙江蓝卓工业互联网信息技术有限公司 Behavior detection method and device
CN111860278A (en) * 2020-07-14 2020-10-30 陕西理工大学 Human behavior recognition algorithm based on deep learning
CN111860304A (en) * 2020-07-17 2020-10-30 北京百度网讯科技有限公司 Image labeling method, electronic device, equipment and storage medium
CN111860430A (en) * 2020-07-30 2020-10-30 浙江大华技术股份有限公司 Identification method and device of fighting behavior, storage medium and electronic device
CN111881754A (en) * 2020-06-28 2020-11-03 浙江大华技术股份有限公司 Behavior detection method, system, equipment and computer equipment
CN111881804A (en) * 2020-07-22 2020-11-03 汇纳科技股份有限公司 Attitude estimation model training method, system, medium and terminal based on joint training
CN111914673A (en) * 2020-07-08 2020-11-10 浙江大华技术股份有限公司 Target behavior detection method and device and computer readable storage medium
CN111914667A (en) * 2020-07-08 2020-11-10 浙江大华技术股份有限公司 Smoking detection method and device
CN112052843A (en) * 2020-10-14 2020-12-08 福建天晴在线互动科技有限公司 Method for detecting key points of human face from coarse to fine
CN112069931A (en) * 2020-08-20 2020-12-11 深圳数联天下智能科技有限公司 State report generation method and state monitoring system
CN112085003A (en) * 2020-09-24 2020-12-15 湖北科技学院 Automatic identification method and device for abnormal behaviors in public places and camera equipment
CN112200076A (en) * 2020-10-10 2021-01-08 福州大学 Method for carrying out multi-target tracking based on head and trunk characteristics
CN112233131A (en) * 2020-10-22 2021-01-15 广州极飞科技有限公司 Method, device and equipment for dividing block and storage medium
CN112257659A (en) * 2020-11-11 2021-01-22 四川云从天府人工智能科技有限公司 Detection tracking method, apparatus and medium
CN112349150A (en) * 2020-11-19 2021-02-09 飞友科技有限公司 Video acquisition method and system for airport flight guarantee time node
CN112488073A (en) * 2020-12-21 2021-03-12 苏州科达特种视讯有限公司 Target detection method, system, device and storage medium
CN112613382A (en) * 2020-12-17 2021-04-06 浙江大华技术股份有限公司 Object integrity determination method and device, storage medium and electronic device
CN112633496A (en) * 2020-12-18 2021-04-09 杭州海康威视数字技术股份有限公司 Detection model processing method and device
CN112686207A (en) * 2021-01-22 2021-04-20 北京同方软件有限公司 Urban street scene target detection method based on regional information enhancement
CN113012089A (en) * 2019-12-19 2021-06-22 北京金山云网络技术有限公司 Image quality evaluation method and device
CN113269013A (en) * 2020-02-17 2021-08-17 京东方科技集团股份有限公司 Object behavior analysis method, information display method and electronic equipment
CN113327312A (en) * 2021-05-27 2021-08-31 百度在线网络技术(北京)有限公司 Virtual character driving method, device, equipment and storage medium
CN113379247A (en) * 2021-06-10 2021-09-10 鑫安利中(北京)科技有限公司 Modeling method and system of enterprise potential safety hazard tracking model
CN113409374A (en) * 2021-07-12 2021-09-17 东南大学 Character video alignment method based on motion registration
CN113420604A (en) * 2021-05-28 2021-09-21 沈春华 Multi-person posture estimation method and device and electronic equipment
CN113470080A (en) * 2021-07-20 2021-10-01 浙江大华技术股份有限公司 Illegal behavior identification method
CN113496046A (en) * 2021-01-18 2021-10-12 图林科技(深圳)有限公司 E-commerce logistics system and method based on block chain
CN113537072A (en) * 2021-07-19 2021-10-22 之江实验室 Posture estimation and human body analysis combined learning system based on parameter hard sharing
CN113597614A (en) * 2020-12-31 2021-11-02 商汤国际私人有限公司 Image processing method and device, electronic device and storage medium
CN113688734A (en) * 2021-08-25 2021-11-23 燕山大学 Old man falling detection method based on FPGA heterogeneous acceleration
CN113705445A (en) * 2021-08-27 2021-11-26 深圳龙岗智能视听研究院 Human body posture recognition method and device based on event camera
CN114387614A (en) * 2021-12-06 2022-04-22 西北大学 Complex human body posture estimation method based on double key point physiological association constraint
CN114842550A (en) * 2022-03-31 2022-08-02 北京的卢深视科技有限公司 Foul behavior detection method and apparatus, electronic device and storage medium
CN114943873A (en) * 2022-05-26 2022-08-26 深圳市科荣软件股份有限公司 Method and device for classifying abnormal behaviors of construction site personnel
CN116189229A (en) * 2022-11-30 2023-05-30 中信重工开诚智能装备有限公司 Personnel tracking method based on coal mine auxiliary transportation robot
CN116442393A (en) * 2023-06-08 2023-07-18 山东博硕自动化技术有限公司 Intelligent unloading method, system and control equipment for mixing plant based on video identification
CN116580245A (en) * 2023-05-29 2023-08-11 哈尔滨市科佳通用机电股份有限公司 Rail wagon bearing saddle dislocation fault identification method
CN117037272A (en) * 2023-08-08 2023-11-10 深圳市震有智联科技有限公司 Method and system for monitoring fall of old people
CN111709336B (en) * 2020-06-08 2024-04-26 杭州像素元科技有限公司 Expressway pedestrian detection method, equipment and readable storage medium

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109544595B (en) * 2018-10-29 2020-06-16 苏宁易购集团股份有限公司 Customer path tracking method and system
CN109492581B (en) * 2018-11-09 2023-07-18 中国石油大学(华东) Human body action recognition method based on TP-STG frame
CN109558832B (en) 2018-11-27 2021-03-26 广州市百果园信息技术有限公司 Human body posture detection method, device, equipment and storage medium
CN109711273B (en) * 2018-12-04 2020-01-17 北京字节跳动网络技术有限公司 Image key point extraction method and device, readable storage medium and electronic equipment
CN111368594B (en) * 2018-12-26 2023-07-18 中国电信股份有限公司 Method and device for detecting key points
CN109766823A (en) * 2019-01-07 2019-05-17 浙江大学 A kind of high-definition remote sensing ship detecting method based on deep layer convolutional neural networks
CN109977997B (en) * 2019-02-13 2021-02-02 中国科学院自动化研究所 Image target detection and segmentation method based on convolutional neural network rapid robustness
CN110096983A (en) * 2019-04-22 2019-08-06 苏州海赛人工智能有限公司 The safe dress ornament detection method of construction worker in a kind of image neural network based
CN110046600B (en) * 2019-04-24 2021-02-26 北京京东尚科信息技术有限公司 Method and apparatus for human detection
CN110348290A (en) * 2019-05-27 2019-10-18 天津中科智能识别产业技术研究院有限公司 Coke tank truck safe early warning visible detection method
CN110414348A (en) * 2019-06-26 2019-11-05 深圳云天励飞技术有限公司 A kind of method for processing video frequency and device
CN110501339B (en) * 2019-08-13 2022-03-29 江苏大学 Cloth cover positioning method in complex environment
CN111062239A (en) * 2019-10-15 2020-04-24 平安科技(深圳)有限公司 Human body target detection method and device, computer equipment and storage medium
CN110717476A (en) * 2019-10-22 2020-01-21 上海眼控科技股份有限公司 Image processing method, image processing device, computer equipment and computer readable storage medium
CN110929711B (en) * 2019-11-15 2022-05-31 智慧视通(杭州)科技发展有限公司 Method for automatically associating identity information and shape information applied to fixed scene
CN111191690B (en) * 2019-12-16 2023-09-05 上海航天控制技术研究所 Space target autonomous identification method based on transfer learning, electronic equipment and storage medium
CN111079695B (en) * 2019-12-30 2021-06-01 北京华宇信息技术有限公司 Human body key point detection and self-learning method and device
CN111209829B (en) * 2019-12-31 2023-05-02 浙江大学 Vision-based moving vision body static medium-small scale target identification method
CN111246113B (en) * 2020-03-05 2022-03-18 上海瑾盛通信科技有限公司 Image processing method, device, equipment and storage medium
CN111798486B (en) * 2020-06-16 2022-05-17 浙江大学 Multi-view human motion capture method based on human motion prediction
CN111680705B (en) * 2020-08-13 2021-02-26 南京信息工程大学 MB-SSD method and MB-SSD feature extraction network suitable for target detection
CN112633178A (en) * 2020-12-24 2021-04-09 深圳集智数字科技有限公司 Image identification method and device, storage medium and electronic equipment
CN112784771B (en) * 2021-01-27 2022-09-30 浙江芯昇电子技术有限公司 Human shape detection method, system and monitoring equipment
CN113505763B (en) * 2021-09-09 2022-02-01 北京爱笔科技有限公司 Key point detection method and device, electronic equipment and storage medium
CN114240844B (en) * 2021-11-23 2023-03-14 电子科技大学 Unsupervised key point positioning and target detection method in medical image
CN114973334A (en) * 2022-07-29 2022-08-30 浙江大华技术股份有限公司 Human body part association method, device, electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150154449A1 (en) * 2013-11-29 2015-06-04 Fujitsu Limited Method and apparatus for recognizing actions
CN106611157A (en) * 2016-11-17 2017-05-03 中国石油大学(华东) Multi-people posture recognition method based on optical flow positioning and sliding window detection
CN106909887A (en) * 2017-01-19 2017-06-30 南京邮电大学盐城大数据研究院有限公司 A kind of action identification method based on CNN and SVM
CN107256386A (en) * 2017-05-23 2017-10-17 东南大学 Human behavior analysis method based on deep learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780557B (en) * 2016-12-23 2020-06-09 南京邮电大学 Moving object tracking method based on optical flow method and key point features

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150154449A1 (en) * 2013-11-29 2015-06-04 Fujitsu Limited Method and apparatus for recognizing actions
CN106611157A (en) * 2016-11-17 2017-05-03 中国石油大学(华东) Multi-people posture recognition method based on optical flow positioning and sliding window detection
CN106909887A (en) * 2017-01-19 2017-06-30 南京邮电大学盐城大数据研究院有限公司 A kind of action identification method based on CNN and SVM
CN107256386A (en) * 2017-05-23 2017-10-17 东南大学 Human behavior analysis method based on deep learning

Cited By (88)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991388A (en) * 2019-12-16 2020-04-10 安徽小眯当家信息技术有限公司 Method for calculating character illumination view azimuth correction angle
CN110991388B (en) * 2019-12-16 2023-07-14 小哆智能科技(北京)有限公司 Method for calculating azimuth correction angle of character illumination view
CN113012089A (en) * 2019-12-19 2021-06-22 北京金山云网络技术有限公司 Image quality evaluation method and device
CN111008631A (en) * 2019-12-20 2020-04-14 浙江大华技术股份有限公司 Image association method and device, storage medium and electronic device
CN111008631B (en) * 2019-12-20 2023-06-16 浙江大华技术股份有限公司 Image association method and device, storage medium and electronic device
CN111259790A (en) * 2020-01-15 2020-06-09 上海交通大学 Coarse-to-fine behavior rapid detection and classification method and system for medium-short time video
CN111259790B (en) * 2020-01-15 2023-06-20 上海交通大学 Method and system for quickly detecting and classifying behaviors from coarse to fine of medium-short-time video
CN111259822A (en) * 2020-01-19 2020-06-09 杭州微洱网络科技有限公司 Method for detecting key point of special neck in E-commerce image
CN113269013A (en) * 2020-02-17 2021-08-17 京东方科技集团股份有限公司 Object behavior analysis method, information display method and electronic equipment
CN111368685B (en) * 2020-02-27 2023-09-29 北京字节跳动网络技术有限公司 Method and device for identifying key points, readable medium and electronic equipment
CN111368685A (en) * 2020-02-27 2020-07-03 北京字节跳动网络技术有限公司 Key point identification method and device, readable medium and electronic equipment
CN111369539A (en) * 2020-03-06 2020-07-03 浙江大学 Building facade window detecting system based on multi-feature map fusion
CN111369539B (en) * 2020-03-06 2023-06-16 浙江大学 Building facade window detecting system based on multi-feature image fusion
CN111402414A (en) * 2020-03-10 2020-07-10 北京京东叁佰陆拾度电子商务有限公司 Point cloud map construction method, device, equipment and storage medium
CN111508019A (en) * 2020-03-11 2020-08-07 上海商汤智能科技有限公司 Target detection method, training method of model thereof, and related device and equipment
CN111444828A (en) * 2020-03-25 2020-07-24 腾讯科技(深圳)有限公司 Model training method, target detection method, device and storage medium
CN111428664B (en) * 2020-03-30 2023-08-25 厦门瑞为信息技术有限公司 Computer vision real-time multi-person gesture estimation method based on deep learning technology
CN111428664A (en) * 2020-03-30 2020-07-17 厦门瑞为信息技术有限公司 Real-time multi-person posture estimation method based on artificial intelligence deep learning technology for computer vision
CN111524062B (en) * 2020-04-22 2023-11-24 北京百度网讯科技有限公司 Image generation method and device
CN111524062A (en) * 2020-04-22 2020-08-11 北京百度网讯科技有限公司 Image generation method and device
CN111597974A (en) * 2020-05-14 2020-08-28 哈工大机器人(合肥)国际创新研究院 Monitoring method and system based on TOF camera for personnel activities in carriage
CN111597974B (en) * 2020-05-14 2023-05-12 哈工大机器人(合肥)国际创新研究院 Monitoring method and system for personnel activities in carriage based on TOF camera
CN111832386A (en) * 2020-05-22 2020-10-27 大连锐动科技有限公司 Method and device for estimating human body posture and computer readable medium
CN111667535A (en) * 2020-06-04 2020-09-15 电子科技大学 Six-degree-of-freedom pose estimation method for occlusion scene
CN111709336A (en) * 2020-06-08 2020-09-25 杭州像素元科技有限公司 Highway pedestrian detection method and device and readable storage medium
CN111709336B (en) * 2020-06-08 2024-04-26 杭州像素元科技有限公司 Expressway pedestrian detection method, equipment and readable storage medium
CN111881754A (en) * 2020-06-28 2020-11-03 浙江大华技术股份有限公司 Behavior detection method, system, equipment and computer equipment
CN111914673B (en) * 2020-07-08 2023-06-16 浙江大华技术股份有限公司 Method and device for detecting target behavior and computer readable storage medium
CN111914667A (en) * 2020-07-08 2020-11-10 浙江大华技术股份有限公司 Smoking detection method and device
CN111914667B (en) * 2020-07-08 2023-04-07 浙江大华技术股份有限公司 Smoking detection method and device
CN111914673A (en) * 2020-07-08 2020-11-10 浙江大华技术股份有限公司 Target behavior detection method and device and computer readable storage medium
CN111860278A (en) * 2020-07-14 2020-10-30 陕西理工大学 Human behavior recognition algorithm based on deep learning
CN111860304A (en) * 2020-07-17 2020-10-30 北京百度网讯科技有限公司 Image labeling method, electronic device, equipment and storage medium
CN111860304B (en) * 2020-07-17 2024-04-30 北京百度网讯科技有限公司 Image labeling method, electronic device, equipment and storage medium
CN111881804B (en) * 2020-07-22 2023-07-28 汇纳科技股份有限公司 Posture estimation model training method, system, medium and terminal based on joint training
CN111881804A (en) * 2020-07-22 2020-11-03 汇纳科技股份有限公司 Attitude estimation model training method, system, medium and terminal based on joint training
CN111832526A (en) * 2020-07-23 2020-10-27 浙江蓝卓工业互联网信息技术有限公司 Behavior detection method and device
CN111860430B (en) * 2020-07-30 2023-04-07 浙江大华技术股份有限公司 Identification method and device of fighting behavior, storage medium and electronic device
CN111860430A (en) * 2020-07-30 2020-10-30 浙江大华技术股份有限公司 Identification method and device of fighting behavior, storage medium and electronic device
CN112069931A (en) * 2020-08-20 2020-12-11 深圳数联天下智能科技有限公司 State report generation method and state monitoring system
CN112085003A (en) * 2020-09-24 2020-12-15 湖北科技学院 Automatic identification method and device for abnormal behaviors in public places and camera equipment
CN112085003B (en) * 2020-09-24 2024-04-05 湖北科技学院 Automatic recognition method and device for abnormal behaviors in public places and camera equipment
CN112200076A (en) * 2020-10-10 2021-01-08 福州大学 Method for carrying out multi-target tracking based on head and trunk characteristics
CN112200076B (en) * 2020-10-10 2023-02-21 福州大学 Method for carrying out multi-target tracking based on head and trunk characteristics
CN112052843B (en) * 2020-10-14 2023-06-06 福建天晴在线互动科技有限公司 Face key point detection method from coarse face to fine face
CN112052843A (en) * 2020-10-14 2020-12-08 福建天晴在线互动科技有限公司 Method for detecting key points of human face from coarse to fine
CN112233131A (en) * 2020-10-22 2021-01-15 广州极飞科技有限公司 Method, device and equipment for dividing block and storage medium
CN112233131B (en) * 2020-10-22 2022-11-08 广州极飞科技股份有限公司 Method, device and equipment for dividing land block and storage medium
CN112257659B (en) * 2020-11-11 2024-04-05 四川云从天府人工智能科技有限公司 Detection tracking method, device and medium
CN112257659A (en) * 2020-11-11 2021-01-22 四川云从天府人工智能科技有限公司 Detection tracking method, apparatus and medium
CN112349150A (en) * 2020-11-19 2021-02-09 飞友科技有限公司 Video acquisition method and system for airport flight guarantee time node
CN112613382B (en) * 2020-12-17 2024-04-30 浙江大华技术股份有限公司 Method and device for determining object integrity, storage medium and electronic device
CN112613382A (en) * 2020-12-17 2021-04-06 浙江大华技术股份有限公司 Object integrity determination method and device, storage medium and electronic device
CN112633496A (en) * 2020-12-18 2021-04-09 杭州海康威视数字技术股份有限公司 Detection model processing method and device
CN112633496B (en) * 2020-12-18 2023-08-08 杭州海康威视数字技术股份有限公司 Processing method and device for detection model
CN112488073A (en) * 2020-12-21 2021-03-12 苏州科达特种视讯有限公司 Target detection method, system, device and storage medium
US20220207266A1 (en) * 2020-12-31 2022-06-30 Sensetime International Pte. Ltd. Methods, devices, electronic apparatuses and storage media of image processing
CN113597614A (en) * 2020-12-31 2021-11-02 商汤国际私人有限公司 Image processing method and device, electronic device and storage medium
CN113496046A (en) * 2021-01-18 2021-10-12 图林科技(深圳)有限公司 E-commerce logistics system and method based on block chain
CN112686207A (en) * 2021-01-22 2021-04-20 北京同方软件有限公司 Urban street scene target detection method based on regional information enhancement
CN112686207B (en) * 2021-01-22 2024-02-27 北京同方软件有限公司 Urban street scene target detection method based on regional information enhancement
CN113327312A (en) * 2021-05-27 2021-08-31 百度在线网络技术(北京)有限公司 Virtual character driving method, device, equipment and storage medium
CN113327312B (en) * 2021-05-27 2023-09-08 百度在线网络技术(北京)有限公司 Virtual character driving method, device, equipment and storage medium
CN113420604A (en) * 2021-05-28 2021-09-21 沈春华 Multi-person posture estimation method and device and electronic equipment
CN113379247A (en) * 2021-06-10 2021-09-10 鑫安利中(北京)科技有限公司 Modeling method and system of enterprise potential safety hazard tracking model
CN113379247B (en) * 2021-06-10 2024-03-29 锐仕方达人才科技集团有限公司 Modeling method and system for enterprise potential safety hazard tracking model
CN113409374A (en) * 2021-07-12 2021-09-17 东南大学 Character video alignment method based on motion registration
CN113537072B (en) * 2021-07-19 2024-03-12 之江实验室 Gesture estimation and human body analysis combined learning system based on parameter hard sharing
CN113537072A (en) * 2021-07-19 2021-10-22 之江实验室 Posture estimation and human body analysis combined learning system based on parameter hard sharing
CN113470080A (en) * 2021-07-20 2021-10-01 浙江大华技术股份有限公司 Illegal behavior identification method
CN113688734A (en) * 2021-08-25 2021-11-23 燕山大学 Old man falling detection method based on FPGA heterogeneous acceleration
CN113688734B (en) * 2021-08-25 2023-09-22 燕山大学 FPGA heterogeneous acceleration-based old people falling detection method
CN113705445A (en) * 2021-08-27 2021-11-26 深圳龙岗智能视听研究院 Human body posture recognition method and device based on event camera
CN113705445B (en) * 2021-08-27 2023-08-04 深圳龙岗智能视听研究院 Method and equipment for recognizing human body posture based on event camera
CN114387614B (en) * 2021-12-06 2023-09-01 西北大学 Complex human body posture estimation method based on double key point physiological association constraint
CN114387614A (en) * 2021-12-06 2022-04-22 西北大学 Complex human body posture estimation method based on double key point physiological association constraint
CN114842550B (en) * 2022-03-31 2023-01-24 合肥的卢深视科技有限公司 Foul behavior detection method and apparatus, electronic device and storage medium
CN114842550A (en) * 2022-03-31 2022-08-02 北京的卢深视科技有限公司 Foul behavior detection method and apparatus, electronic device and storage medium
CN114943873B (en) * 2022-05-26 2023-10-17 深圳市科荣软件股份有限公司 Method and device for classifying abnormal behaviors of staff on construction site
CN114943873A (en) * 2022-05-26 2022-08-26 深圳市科荣软件股份有限公司 Method and device for classifying abnormal behaviors of construction site personnel
CN116189229B (en) * 2022-11-30 2024-04-05 中信重工开诚智能装备有限公司 Personnel tracking method based on coal mine auxiliary transportation robot
CN116189229A (en) * 2022-11-30 2023-05-30 中信重工开诚智能装备有限公司 Personnel tracking method based on coal mine auxiliary transportation robot
CN116580245A (en) * 2023-05-29 2023-08-11 哈尔滨市科佳通用机电股份有限公司 Rail wagon bearing saddle dislocation fault identification method
CN116580245B (en) * 2023-05-29 2023-12-26 哈尔滨市科佳通用机电股份有限公司 Rail wagon bearing saddle dislocation fault identification method
CN116442393A (en) * 2023-06-08 2023-07-18 山东博硕自动化技术有限公司 Intelligent unloading method, system and control equipment for mixing plant based on video identification
CN116442393B (en) * 2023-06-08 2024-02-13 山东博硕自动化技术有限公司 Intelligent unloading method, system and control equipment for mixing plant based on video identification
CN117037272B (en) * 2023-08-08 2024-03-19 深圳市震有智联科技有限公司 Method and system for monitoring fall of old people
CN117037272A (en) * 2023-08-08 2023-11-10 深圳市震有智联科技有限公司 Method and system for monitoring fall of old people

Also Published As

Publication number Publication date
CN108710868B (en) 2020-09-04
CN108710868A (en) 2018-10-26

Similar Documents

Publication Publication Date Title
WO2019232894A1 (en) Complex scene-based human body key point detection system and method
Dhiman et al. A review of state-of-the-art techniques for abnormal human activity recognition
CN109492581B (en) Human body action recognition method based on TP-STG frame
CN111666843B (en) Pedestrian re-recognition method based on global feature and local feature splicing
WO2023082882A1 (en) Pose estimation-based pedestrian fall action recognition method and device
Ansari et al. Human detection techniques for real time surveillance: A comprehensive survey
CN110991274B (en) Pedestrian tumbling detection method based on Gaussian mixture model and neural network
Zraqou et al. Real-time objects recognition approach for assisting blind people
CN110688980A (en) Human body posture classification method based on computer vision
Gupta et al. Image-based Road Pothole Detection using Deep Learning Model
Zhou et al. A study on attention-based LSTM for abnormal behavior recognition with variable pooling
CN115527269A (en) Intelligent human body posture image identification method and system
CN114170686A (en) Elbow bending behavior detection method based on human body key points
Miao et al. Abnormal Behavior Learning Based on Edge Computing toward a Crowd Monitoring System
Avola et al. Machine learning for video event recognition
Kumar Visual object tracking using deep learning
Zhou et al. A review of multiple-person abnormal activity recognition
Chen et al. Skeleton moving pose-based human fall detection with sparse coding and temporal pyramid pooling
CN113763418B (en) Multi-target tracking method based on head and shoulder detection
CN112597842B (en) Motion detection facial paralysis degree evaluation system based on artificial intelligence
CN115100014A (en) Multi-level perception-based social network image copying and moving counterfeiting detection method
Mahjoub et al. Naive Bayesian fusion for action recognition from Kinect
CN112541403A (en) Indoor personnel falling detection method utilizing infrared camera
Wang et al. A fall detection system based on convolutional neural networks
al Atrash et al. Detecting and Counting People's Faces in Images Using Convolutional Neural Networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18922038

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18922038

Country of ref document: EP

Kind code of ref document: A1