WO2023077897A1 - Human body detection method and apparatus, electronic device, and computer-readable storage medium - Google Patents

Human body detection method and apparatus, electronic device, and computer-readable storage medium Download PDF

Info

Publication number
WO2023077897A1
WO2023077897A1 PCT/CN2022/111687 CN2022111687W WO2023077897A1 WO 2023077897 A1 WO2023077897 A1 WO 2023077897A1 CN 2022111687 W CN2022111687 W CN 2022111687W WO 2023077897 A1 WO2023077897 A1 WO 2023077897A1
Authority
WO
WIPO (PCT)
Prior art keywords
human body
feature
structural
image
features
Prior art date
Application number
PCT/CN2022/111687
Other languages
French (fr)
Chinese (zh)
Inventor
罗静
郭宇鹏
王晓
毛少将
雷庆庆
Original Assignee
通号通信信息集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 通号通信信息集团有限公司 filed Critical 通号通信信息集团有限公司
Publication of WO2023077897A1 publication Critical patent/WO2023077897A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • Embodiments of the present disclosure relate to the technical field of artificial intelligence, and in particular, to a human body detection method and device, electronic equipment, and a computer-readable storage medium.
  • Human body posture recognition mainly lies in the research and description of human body posture and prediction of human behavior.
  • the recognition process refers to the process of recognizing human body movements according to the changes in the positions of joint points in the human body in a specified image or video.
  • a human body detection result of the image to be detected is determined according to the structural feature and the color feature, and the human body detection result includes human body frame and key point information of the image to be detected.
  • the detection module is configured to determine the human body detection result of the image to be detected according to the structural feature and the color feature, and the human body detection result includes human body frame and key point information of the image to be detected.
  • an embodiment of the present disclosure provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, and the embodiment of the present disclosure is implemented when the processor executes the computer program Any human detection method.
  • an embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, any human body detection method in the embodiments of the present disclosure is implemented.
  • the structural features of the image to be detected are extracted; according to the structural features, the human body area in the image to be detected is determined; the color features of the human body area are extracted; and the human body detection result of the image to be detected is determined according to the structural features and color features , the human body detection result includes the body frame and key point information of the image to be detected.
  • This method uses structural features and color features to detect human bodies together, so that high-accuracy human body detection results can be obtained, and the human body detection model corresponding to this method can be trained using images with human body frames, without using key point coordinates Marked images are used for training to avoid manual labeling of key point coordinates.
  • FIG. 2 is a flow chart of a method for extracting structural features provided by an embodiment of the present disclosure.
  • FIG. 3 is a flowchart of a human body detection model training method provided by an embodiment of the present disclosure.
  • Fig. 5 is a block diagram of a human body detection device provided by an embodiment of the present disclosure.
  • FIG. 6 is a block diagram of an electronic device used to implement the human body detection method of the embodiment of the present disclosure.
  • Action recognition algorithms have always been one of the main scenarios of artificial intelligence applications, such as fall recognition, fight detection, climbing detection, etc.
  • the core of this type of algorithm includes key point (or key bone point) detection and action classification.
  • key point or key bone point
  • action classification the accuracy of action classification depends on the accuracy of keypoint detection.
  • the mainstream key point detection methods include OpenPose, MoveNet, etc., which all belong to the detection method of regressing key point coordinates according to features.
  • the above key point detection method has high recognition accuracy in most experimental scenarios, in actual application scenarios, due to the complexity of the situation, the recognition effect is likely to be unsatisfactory.
  • the accuracy of key point detection results is not high due to dense crowds and severe occlusion.
  • the human body detection algorithm usually uses a large amount of training data marked with coordinates of key points to train the initial human body detection model. After obtaining the trained human body detection model, input the picture to be detected into the human body detection model, and the model has After processing, the human body detection result is output, and the human body detection result includes the key point coordinates of the picture to be detected.
  • the training of the human detection model relies on the dataset marked with keypoint coordinates. Usually, the number of coordinates that need to be marked with key points in the picture is large, and the similarity of different human body parts is high, which makes the coordinates of key points difficult to mark, resulting in a lot of time and manpower.
  • the embodiments of the present disclosure provide a human body detection method and device, and its corresponding human body detection model can be trained using sample images marked with human body frames, which is better than using sample images marked with key point coordinates for training.
  • the operation complexity is effectively reduced, saving time and labor costs.
  • an embodiment of the present disclosure provides a human body detection method.
  • the human body detection method in the embodiments of the present disclosure can be executed by a corresponding human body detection device, which can be implemented in software and/or hardware, and can generally be integrated into electronic equipment.
  • FIG. 1 is a flow chart of a human body detection method provided by an embodiment of the present disclosure.
  • the human body detection method of the embodiment of the present disclosure includes:
  • Step S101 extracting structural features of the image to be detected.
  • the structural features are used to characterize the structural information of the image to be detected.
  • its structural features may be structural features such as the head, elbow, joint, and wrist of the human body.
  • structural features can be extracted from the image to be detected by convolution.
  • the structural features of the image to be detected include first structural features and second structural features
  • the step of extracting the structural features of the image to be detected includes:
  • feature extraction is performed on the image to be detected based on the preset first convolution kernel to obtain the first structural feature; secondly, based on the preset second convolution kernel, feature extraction is performed on the first structural feature to obtain the second structural feature.
  • the convolution kernel can be regarded as a filter matrix, which is used to extract features from the convolved image.
  • the second structural feature is a feature obtained by further convolution of the first structural feature, which is a feature with a higher level (or feature scale) and a more global feature than the first structural feature .
  • the feature levels corresponding to the first structural feature and the second structural feature are related to the convolution kernel size and convolution step size used when extracting the feature.
  • the first structural feature includes eye features, nose features, and mouth features of a person in the image to be detected, and correspondingly, the second structural feature may be a face feature.
  • the first structural feature includes head features, elbow features, hand features, and leg features of a person in the image to be detected, and correspondingly, the second structural feature may be the overall structural feature of the person.
  • first structural feature and second structural feature are only examples, and can be flexibly set according to actual needs, which is not limited in the present disclosure.
  • the first convolution kernel corresponds to the same or similar feature extraction scale
  • the first convolution kernel includes multiple convolution kernel clusters
  • the convolution kernels in the same convolution kernel cluster are used to extract the same specific Structural features
  • the third structural features extracted by the same convolution kernel cluster are superimposed to obtain the final first structural features.
  • feature superposition is performed on features of the same scale in units of feature types, so as to enhance the expression of features.
  • Step S102 determine the human body area in the image to be detected.
  • the image to be detected includes a foreground area and a background area.
  • the foreground area specifically refers to the human body area
  • the background area refers to the area composed of objects, items, etc. other than the human body area.
  • the step of determining the human body area in the image to be detected according to the structural feature includes: returning the structural feature to the image to be detected, and determining the human body area in the image to be detected.
  • the structural features include a first structural feature and a second structural feature.
  • the first structural feature and the second structural feature are returned to the image to be detected to obtain a first human body region corresponding to the first structural feature and a second human body region corresponding to the second structural feature.
  • the step of determining the human body region in the image to be detected includes:
  • the structural features are filtered according to the preset structural feature threshold to obtain the filtered structural features, wherein the structural feature threshold is used to filter the background structural features in the structural features; secondly, the filtered structural features are returned to the image to be detected, Determine the human body region in the image to be detected.
  • the structural feature threshold may be obtained according to experience, statistical data or through training, which is not limited in the present disclosure.
  • the first structural feature is filtered according to the first structural feature threshold to obtain the first filtered structural feature; the first filtered structural feature is returned to the image to be detected to obtain the first human body region.
  • the second structural feature is filtered according to the second structural feature threshold to obtain the second filtered structural feature; the second filtered structural feature is returned to the image to be detected to obtain the second human body region.
  • Step S103 extracting color features of the human body region.
  • the color feature is used to represent the color information of the human body area.
  • the color feature is a feature based on the grayscale of an image
  • the color feature is a feature based on an RGB (Red, Green, Blue, red, green, blue) color channel.
  • the step of determining the human body detection result of the image to be detected includes:
  • the structure feature and the color feature can be connected through the connection (Concat) function.
  • Activation functions include, but are not limited to, Sigmoid, Tanh, and ReLU.
  • Step S202 superimposing the third structural features corresponding to the same convolution kernel cluster to obtain the structural features corresponding to the convolution kernel cluster.
  • the first structural feature includes structural features corresponding to multiple convolution kernel clusters.
  • the structural features can be enhanced, thereby obtaining better structural features.
  • Step S203 performing feature extraction on the first structural feature based on the preset second convolution kernel to obtain the second structural feature.
  • the first convolution kernel includes 100 convolution kernels, and these convolution kernels are divided into the first convolution kernel cluster, the second convolution kernel cluster and the third convolution kernel cluster, wherein, belonging to the first volume
  • the convolution kernel of the convolution kernel cluster has a better extraction effect when extracting elbow features
  • the convolution kernel belonging to the second convolution kernel cluster has a better extraction effect when extracting wrist features
  • the convolution kernel belonging to the third convolution kernel cluster has a better extraction effect.
  • the convolution kernel has a better extraction effect when extracting head features.
  • the third structural features extracted by the convolution kernels belonging to it are superimposed to obtain the first structural features corresponding to the first convolution kernel cluster; for the second convolution kernel cluster, the belonging The third structural feature extracted by the convolution kernel is superimposed to obtain the first structural feature corresponding to the second convolution kernel cluster; for the third convolution kernel cluster, the first convolution kernel extracted by the convolution kernel belonging to it is The three structural features are superimposed to obtain the first structural feature corresponding to the third convolution kernel cluster.
  • the first convolution kernel cluster since the first convolution kernel cluster has a better effect in extracting elbow features, the first structural feature corresponding to the first convolution kernel cluster can better reflect the elbow feature than the single third structural feature. internal features. Similarly, the first structural feature corresponding to the second convolution kernel cluster can better reflect the wrist feature, and the first structural feature corresponding to the third convolution kernel cluster can better reflect the head feature.
  • the human body detection method provided by the embodiments of the present disclosure may be implemented by a preset human body detection model.
  • the human body detection model includes a model constructed based on a neural network.
  • the human body detection model is trained through a preset training set, wherein the training set includes sample images and human frame annotation information of the sample images.
  • the training set used for training the human detection model includes sample images and key point coordinate labeling information of the sample images.
  • the human body detection model can learn the key point coordinate labeling ability, so as to mark the key point coordinates for the image to be detected.
  • the key point coordinate labeling of sample images usually relies on manual labeling, which is complex and takes a lot of time and manpower.
  • model training is performed using a training set including sample images and human body frame labeling information of the sample images, and the model learns human body frame labeling capabilities during the training process.
  • the human body detection model realizes human body frame labeling, which relies on the recognition and extraction of features in the image. When the recognized and extracted features are more accurate, the obtained human body frame labeling is more accurate. Correspondingly, the key points determined based on this feature The coordinates are more accurate.
  • the human body detection model provided by the embodiments of the present disclosure can be trained by using the sample image and the body frame annotation information of the sample image as the training set, and can also obtain the key point coordinate labeling ability, and there is no need to perform key point coordinates on the sample image. Marking simplifies the operation complexity and saves a lot of time and manpower.
  • FIG. 3 is a flowchart of a human body detection model training method provided by an embodiment of the present disclosure.
  • Step S301 input the training set into the initial human body detection model, and extract the detailed features of the sample image through the first convolutional network.
  • Step S302 extracting the first structural feature of the sample image through the second convolutional network, and filtering the first structural feature according to the first structural feature threshold to obtain the first filtered structural feature.
  • Step S303 returning the first filter structure feature to the sample image, determining the first human body region, extracting the color features of the first human body region, and obtaining the first color feature.
  • Step S304 using the third convolutional network to perform global feature extraction on the first structural feature to obtain the second structural feature, and filter the second structural feature according to the second structural feature threshold to obtain the second filtered structural feature.
  • Step S307 input the first connection feature and the second connection feature into the activation layer, and obtain the human body detection result of the sample image through activation processing.
  • the human body detection result includes the body frame and key point information of the sample image.
  • Step S308 adjust the parameters of the human body detection model according to the human body detection result, and use the adjusted human body detection model to perform iterative training until the preset stop condition is met, then stop the model training.
  • the stop condition may be a condition related to detection accuracy and/or training times, which is not limited in the present disclosure.
  • the human body detection model obtained after the training is stopped is regarded as a model that meets the requirements, and human body detection can be performed based on the human body detection model.
  • convolution kernel clusters obtained by clustering there are some convolution kernel clusters, which are less effective in extracting structural features (for example, convolution kernel clusters that mainly extract noise features), for this part of convolution kernel clusters, It can be filtered out to improve the accuracy of structural feature extraction.
  • model parameters can be reduced, and the generalization ability of the model can be improved, so that the human body detection model can obtain good human body detection results for different application scenarios and different types of pictures.
  • Fig. 4 is a schematic diagram of a training process of a human body detection model provided by an embodiment of the present disclosure.
  • the training set includes multiple sample images, and the sample images are marked with human frame.
  • the detailed features of the sample image are first extracted through convolution operation, and the first structural features are further extracted on the basis of the detailed features, and according to the first structural feature and the threshold of the first structural feature, from the sample image Extract the first color feature.
  • the second structural feature is further extracted, and the global color feature is extracted from the sample image according to the second structural feature and the second structural feature threshold.
  • the convolution kernels for extracting the first structural features can be clustered to obtain convolution kernel clusters.
  • the convolution kernels belonging to the same convolution kernel cluster have high similarity, which is specifically manifested in that these convolution kernels have better extraction effects when extracting a certain type of structural features. Therefore, the first structural features extracted by the convolution kernels in the same convolution kernel cluster can be superimposed to obtain the first structural features with feature enhancement effect, so as to extract the second structural features with higher accuracy based on the first structural features. Structure.
  • step division of the above various methods is only for the sake of clarity of description. During implementation, they can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the protection scope of the present disclosure ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this application.
  • the first extraction module 501 is configured to extract structural features of the image to be detected, and the structural features are used to characterize structural information of the image to be detected.
  • the structural features are used to characterize the structural information of the image to be detected.
  • its structural features may be structural features such as the head, elbow, joint, and foot of the human body.
  • structural features can be extracted from the image to be detected by convolution.
  • the structural features of the image to be detected include first structural features and second structural features
  • the first extraction module 501 includes a first extraction unit and a second extraction unit.
  • the first extraction unit is used to perform feature extraction on the image to be detected based on the preset first convolution kernel to obtain the first structural features
  • the second extraction unit is used to extract the first structure based on the preset second convolution kernel.
  • the first structural feature includes eye features, nose features, and mouth features
  • the second structural feature is a human face feature
  • the first structural feature includes the head feature, elbow feature, hand feature, leg feature and foot feature of a certain person in the image to be detected
  • the second structural feature is the overall structural feature of the person.
  • first structural feature and second structural feature are only examples, and can be flexibly set according to actual needs, which is not limited in the present disclosure.
  • the area determining module 502 is configured to determine the human body area in the image to be detected according to the structural features.
  • the image to be detected includes a foreground area and a background area.
  • the foreground area specifically refers to the human body area
  • the background area refers to the area composed of objects, items, etc. other than the human body area.
  • the area determination module 502 includes a regression unit. Wherein, the regression unit is used for regressing structural features into the image to be detected, and determining the human body area in the image to be detected.
  • the structural features include a first structural feature and a second structural feature. Based on the regression unit, the first structural feature and the second structural feature are respectively regressed into the image to be detected to obtain a first human body region corresponding to the first structural feature and a second human body region corresponding to the second structural feature.
  • the head feature whose structural feature is a human body as an example. If the head feature is directly returned to the image to be detected, its area in the image to be detected is usually a regular rectangular area, and the head of the human body is located in the rectangular area. In other words, the region determined by direct regression of structural features includes both the head region and part of the background region, so the head region cannot be accurately framed from the image to be detected. Based on this, before returning the structural features to the image to be detected, the structural features are firstly filtered to filter out the background structural features to obtain the filtered structural features. When the filtered structural features are returned to the image to be detected, the regression result including only the human body area can be obtained, so as to realize the accurate frame selection of the human body structure and provide a regional basis for the subsequent extraction of color features.
  • the second extraction module 503 is configured to extract color features of the human body region, and the color features are used to represent color information of the human body region.
  • the color feature is used to represent the color information of the human body area.
  • the color feature is a feature based on image grayscale, and for another example, the color feature is a feature based on RGB color channels.
  • the human body detection device further includes an early warning module.
  • the early warning model includes an attitude determining unit and an early warning signal transmitting unit.
  • the posture determination unit is used to determine the human body posture information according to the human body frame and key point information of the image to be detected after determining the human body detection result of the image to be detected; In the event of a preset early warning event, an early warning signal is issued.
  • human body detection is performed on surveillance video to obtain human body detection results, and when the human body posture is determined to be a falling posture according to the human body frame and key point information, it is known that a person falls. Therefore, Early warning signals can be sent to staff terminals or broadcast terminals, so that relevant staff can carry out emergency treatment or start emergency plans in time.
  • the structural features of the image to be detected are extracted by the first extraction module; the area determination module determines the human body area in the image to be detected according to the structural features; the second extraction module extracts the color features of the human body area; and through detection The module determines the human body detection result of the image to be detected according to the structural features and color features.
  • the device uses structural features and color features to perform human body detection together, and can obtain human body detection results with high accuracy, and the human body detection model corresponding to the device can be obtained by training using images with human body frames, without using key point coordinate markers The images are trained to avoid manual labeling of key point coordinates.
  • the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof.
  • the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components cooperate to execute.
  • Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit .
  • Such software may be distributed on computer readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • Computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable program instructions, data structures, program modules, or other data. volatile, removable and non-removable media.
  • Computer storage media include, but are not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), static random access memory (SRAM), flash memory or other memory technologies, portable Compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical disk storage, magnetic cartridge, magnetic tape, magnetic disk storage or other magnetic storage device, or any other device that can be used to store desired information and can be accessed by a computer any other medium.
  • communication media typically embodies computer-readable program instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a human body detection method and apparatus, an electronic device, and a storage medium. The method comprises: extracting a structural feature of an image to be detected; determining a human body area in said image according to the structural feature; extracting a color feature of the human body area; and determining a human body detection result of said image according to the structural feature and the color feature, the human body detection result comprising a human body frame and key point information of said image.

Description

人体检测方法及装置、电子设备、计算机可读存储介质Human body detection method and device, electronic device, and computer-readable storage medium 技术领域technical field
本公开实施例涉及人工智能技术领域,尤其涉及一种人体检测方法及装置、电子设备、计算机可读存储介质。Embodiments of the present disclosure relate to the technical field of artificial intelligence, and in particular, to a human body detection method and device, electronic equipment, and a computer-readable storage medium.
背景技术Background technique
近年来,随着人工智能以及神经网络领域的快速发展,人体姿态识别技术被广泛应用于各种应用场景中。人体姿态识别主要在于研究描述人体姿态以及预测人体行为,其识别过程是指,在指定图像或视频中,根据人体中关节点位置的变化,识别人体动作的过程。In recent years, with the rapid development of artificial intelligence and neural network fields, human gesture recognition technology has been widely used in various application scenarios. Human body posture recognition mainly lies in the research and description of human body posture and prediction of human behavior. The recognition process refers to the process of recognizing human body movements according to the changes in the positions of joint points in the human body in a specified image or video.
发明内容Contents of the invention
本公开实施例提供一种人体检测方法及装置、电子设备、计算机可读存储介质,其可以对简便准确地识别人体。Embodiments of the present disclosure provide a human body detection method and device, electronic equipment, and a computer-readable storage medium, which can identify a human body simply and accurately.
第一方面,本公开实施例提供一种人体检测方法,包括:提取待检测图像的结构特征,所述结构特征用于表征所述待检测图像的结构信息;In a first aspect, an embodiment of the present disclosure provides a human body detection method, including: extracting structural features of an image to be detected, where the structural features are used to characterize structural information of the image to be detected;
根据所述结构特征,确定所述待检测图像中的人体区域;Determining the human body region in the image to be detected according to the structural features;
提取所述人体区域的色彩特征,所述色彩特征用于表征所述人体区域的色彩信息;extracting color features of the human body region, where the color features are used to characterize color information of the human body region;
根据所述结构特征和所述色彩特征,确定所述待检测图像的人体检测结果,所述人体检测结果包括所述待检测图像的人体框和关键点信息。A human body detection result of the image to be detected is determined according to the structural feature and the color feature, and the human body detection result includes human body frame and key point information of the image to be detected.
第二方面,本公开实施例提供一种人体检测装置,包括:第一提取模块,被配置为提取待检测图像的结构特征,所述结构特征用于表征所述待检测图像的结构信息;In a second aspect, an embodiment of the present disclosure provides a human body detection device, including: a first extraction module configured to extract structural features of an image to be detected, where the structural features are used to characterize structural information of the image to be detected;
区域确定模块,被配置为根据所述结构特征,确定所述待检测图像中的人体区域;an area determination module configured to determine the human body area in the image to be detected according to the structural features;
第二提取模块,被配置为提取所述人体区域的色彩特征,所述色彩特征用于表征所述人体区域的色彩信息;The second extraction module is configured to extract color features of the human body region, where the color features are used to represent color information of the human body region;
检测模块,被配置为根据所述结构特征和所述色彩特征,确定所述待检测图像的人体检测结果,所述人体检测结果包括所述待检测图像的人体框和关键点信息。The detection module is configured to determine the human body detection result of the image to be detected according to the structural feature and the color feature, and the human body detection result includes human body frame and key point information of the image to be detected.
第三方面,本公开实施例提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现本公开实施例任意一种人体检测方法。In a third aspect, an embodiment of the present disclosure provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, and the embodiment of the present disclosure is implemented when the processor executes the computer program Any human detection method.
第四方面,本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现本公开实施例任意一种人体检测方法。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, any human body detection method in the embodiments of the present disclosure is implemented.
在本公开实施例中,提取待检测图像的结构特征;根据结构特征,确定待检测图像中的人体区域;提取人体区域的色彩特征;根据结构特征和色彩特征,确定待检测图像的人体检测结果,人体检测结果包括待检测图像的人体框和关键点信息。该方法使用结 构特征和色彩特征共同进行人体检测,从而可以获得准确度较高的人体检测结果,而且该方法对应的人体检测模型使用具有人体框的图像即可训练获得,无需使用具有关键点坐标标记的图像进行训练,避免人工标注关键点坐标。In the embodiment of the present disclosure, the structural features of the image to be detected are extracted; according to the structural features, the human body area in the image to be detected is determined; the color features of the human body area are extracted; and the human body detection result of the image to be detected is determined according to the structural features and color features , the human body detection result includes the body frame and key point information of the image to be detected. This method uses structural features and color features to detect human bodies together, so that high-accuracy human body detection results can be obtained, and the human body detection model corresponding to this method can be trained using images with human body frames, without using key point coordinates Marked images are used for training to avoid manual labeling of key point coordinates.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.
附图说明Description of drawings
图1为本公开实施例提供的一种人体检测方法的流程图。FIG. 1 is a flow chart of a human body detection method provided by an embodiment of the present disclosure.
图2为本公开实施例提供的一种结构特征提取方法的流程图。FIG. 2 is a flow chart of a method for extracting structural features provided by an embodiment of the present disclosure.
图3为本公开实施例提供的一种人体检测模型训练方法的流程图。FIG. 3 is a flowchart of a human body detection model training method provided by an embodiment of the present disclosure.
图4为本公开实施例提供的一种人体检测模型的训练过程示意图。Fig. 4 is a schematic diagram of a training process of a human body detection model provided by an embodiment of the present disclosure.
图5为本公开实施例提供的一种人体检测装置的组成方框图。Fig. 5 is a block diagram of a human body detection device provided by an embodiment of the present disclosure.
图6为用来实现本公开实施例的人体检测方法的电子设备的框图。FIG. 6 is a block diagram of an electronic device used to implement the human body detection method of the embodiment of the present disclosure.
具体实施方式Detailed ways
下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本公开,而非对本公开的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本公开相关的部分而非全部结构。The present disclosure will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present disclosure, but not to limit the present disclosure. In addition, it should be noted that, for the convenience of description, only some structures related to the present disclosure are shown in the drawings but not all structures.
动作类识别算法一直是人工智能应用的主要场景之一,例如摔倒识别、打架检测、攀高检测等,这类算法的核心包括关键点(或关键骨骼点)检测和动作分类。其中,动作分类的准确性依赖于关键点检测的准确性。相关技术中,主流的关键点检测方法包括OpenPose、MoveNet等,其均属于根据特征回归关键点坐标的检测方法。Action recognition algorithms have always been one of the main scenarios of artificial intelligence applications, such as fall recognition, fight detection, climbing detection, etc. The core of this type of algorithm includes key point (or key bone point) detection and action classification. Among them, the accuracy of action classification depends on the accuracy of keypoint detection. In related technologies, the mainstream key point detection methods include OpenPose, MoveNet, etc., which all belong to the detection method of regressing key point coordinates according to features.
其中,OpenPose是美国卡耐基梅隆大学基于卷积神经网络和监督学习,并使用卷积神经网络框架(Convolutional Architecture for Fast Feature Embedding,CAFFE)开发的关于人体姿态识别的开源库,其可以实现人体动作、面部表情、手指运动等姿态估计,适用于单人和多人,具有极好的鲁棒性,是世界上首个基于深度学习的实时多人二维姿态估计应用。MoveNet是谷歌推出的一款能够检测人体姿态的模型,包括闪电(Lighting)和雷电(Thunder)两个衍生版本。前者适用于对延迟比较敏感的关键型应用程序,而后者侧重于牺牲实效性来提升识别的准确性。Among them, OpenPose is an open source library for human body posture recognition developed by Carnegie Mellon University based on convolutional neural network and supervised learning, and using the convolutional neural network framework (Convolutional Architecture for Fast Feature Embedding, CAFFE), which can realize human body movements. , facial expression, finger movement and other pose estimation, suitable for single and multiple people, with excellent robustness, is the world's first real-time multi-person 2D pose estimation application based on deep learning. MoveNet is a model launched by Google that can detect human posture, including two derivative versions of Lightning and Thunder. The former is suitable for critical applications that are sensitive to delay, while the latter focuses on improving the accuracy of recognition at the expense of effectiveness.
上述关键点检测方法虽然在大多数实验场景下的识别准确率较高,但是在实际的应用场景中,由于情况较为复杂,容易导致识别效果不理想。例如,在高铁站、地铁站等公共场合,由于人员比较密集,并且遮挡严重,从而导致关键点检测结果准确率不高。Although the above key point detection method has high recognition accuracy in most experimental scenarios, in actual application scenarios, due to the complexity of the situation, the recognition effect is likely to be unsatisfactory. For example, in public places such as high-speed rail stations and subway stations, the accuracy of key point detection results is not high due to dense crowds and severe occlusion.
在相关技术中,人体检测算法通常使用大量标注有关键点坐标的训练数据对初始人体检测模型进行训练,获得训练好的人体检测模型之后,将待检测图片输入该人体检测模型,模型对输入数据进行处理之后输出人体检测结果,人体检测结果中包括待检测图 片的关键点坐标。在上述方法中,人体检测模型的训练依赖于标注有关键点坐标的数据集。通常情况下,图片中需要进行关键点标注的坐标数量较多,且不同的人体部位相似度较高,从而使得关键点坐标标注难度较大,导致耗费大量的时间和人力。In related technologies, the human body detection algorithm usually uses a large amount of training data marked with coordinates of key points to train the initial human body detection model. After obtaining the trained human body detection model, input the picture to be detected into the human body detection model, and the model has After processing, the human body detection result is output, and the human body detection result includes the key point coordinates of the picture to be detected. In the above method, the training of the human detection model relies on the dataset marked with keypoint coordinates. Usually, the number of coordinates that need to be marked with key points in the picture is large, and the similarity of different human body parts is high, which makes the coordinates of key points difficult to mark, resulting in a lot of time and manpower.
有鉴于此,本公开实施例提供一种人体检测方法及装置,其对应的人体检测模型使用标注人体框的样本图像进行训练即可,较使用标注有关键点坐标的样本图像进行训练而言,操作复杂度得以有效降低,节约了时间和人力成本。In view of this, the embodiments of the present disclosure provide a human body detection method and device, and its corresponding human body detection model can be trained using sample images marked with human body frames, which is better than using sample images marked with key point coordinates for training. The operation complexity is effectively reduced, saving time and labor costs.
第一方面,本公开实施例提供一种人体检测方法。In a first aspect, an embodiment of the present disclosure provides a human body detection method.
本公开实施例的人体检测方法可由相应的人体检测装置执行,该装置可采用软件和/或硬件的方式实现,并一般可集成于电子设备中。The human body detection method in the embodiments of the present disclosure can be executed by a corresponding human body detection device, which can be implemented in software and/or hardware, and can generally be integrated into electronic equipment.
图1为本公开实施例提供的一种人体检测方法的流程图。参照图1,本公开实施例的人体检测方法包括:FIG. 1 is a flow chart of a human body detection method provided by an embodiment of the present disclosure. Referring to Fig. 1, the human body detection method of the embodiment of the present disclosure includes:
步骤S101,提取待检测图像的结构特征。Step S101, extracting structural features of the image to be detected.
其中,结构特征用于表征待检测图像的结构信息。例如,待检测图像中包括人体时,其结构特征可以是人体的头部、肘部、关节、腕部等结构性特征。在一些具体实现中,可以通过卷积方式从待检测图像中提取结构特征。Among them, the structural features are used to characterize the structural information of the image to be detected. For example, when the image to be detected includes a human body, its structural features may be structural features such as the head, elbow, joint, and wrist of the human body. In some specific implementations, structural features can be extracted from the image to be detected by convolution.
在一些实施例中,待检测图像的结构特征包括第一结构特征和第二结构特征,提取待检测图像的结构特征的步骤,包括:In some embodiments, the structural features of the image to be detected include first structural features and second structural features, and the step of extracting the structural features of the image to be detected includes:
首先,基于预设的第一卷积核对待检测图像进行特征提取,获得第一结构特征;其次,基于预设的第二卷积核对第一结构特征进行特征提取,获得第二结构特征。Firstly, feature extraction is performed on the image to be detected based on the preset first convolution kernel to obtain the first structural feature; secondly, based on the preset second convolution kernel, feature extraction is performed on the first structural feature to obtain the second structural feature.
其中,卷积核可以视作滤波器矩阵,其用于从被卷积的图像中提取特征。在本实施例中,第二结构特征是对第一结构特征进行进一步卷积获得的特征,其相对于第一结构特征而言,是特征层次(或特征尺度)更高、更加全局化的特征。第一结构特征和第二结构特征对应的特征层次与提取该特征时所使用的卷积核尺寸以及卷积步长等相关。Among them, the convolution kernel can be regarded as a filter matrix, which is used to extract features from the convolved image. In this embodiment, the second structural feature is a feature obtained by further convolution of the first structural feature, which is a feature with a higher level (or feature scale) and a more global feature than the first structural feature . The feature levels corresponding to the first structural feature and the second structural feature are related to the convolution kernel size and convolution step size used when extracting the feature.
例如,第一结构特征包括待检测图像中某人物的眼部特征、鼻部特征和嘴部特征,相应的,第二结构特征可以为人脸特征。又如,第一结构特征包括待检测图像中某人物的头部特征、肘部特征、手部特征、和腿部特征,相应的,第二结构特征可以为该人物的整体结构特征。For example, the first structural feature includes eye features, nose features, and mouth features of a person in the image to be detected, and correspondingly, the second structural feature may be a face feature. For another example, the first structural feature includes head features, elbow features, hand features, and leg features of a person in the image to be detected, and correspondingly, the second structural feature may be the overall structural feature of the person.
需要说明的是,以上对于第一结构特征和第二结构特征仅是举例说明,可根据实际需求进行灵活设置,本公开对此不作限定。It should be noted that the above-mentioned first structural feature and second structural feature are only examples, and can be flexibly set according to actual needs, which is not limited in the present disclosure.
还需要说明的是,在相关技术中进行特征提取时,通常是将采用同一尺度提取的多个特征进行拼接,获得完整的特征图,再将其与其他尺度的特征图进行融合。与之不同的是,在本公开实施例中,针对同一尺度的特征图,首先根据卷积核簇/特征类型将特征进行叠加,获得增强型特征之后,再执行特征拼接、特征融合等操作。It should also be noted that when performing feature extraction in related technologies, multiple features extracted at the same scale are usually spliced to obtain a complete feature map, which is then fused with feature maps of other scales. The difference is that in the embodiments of the present disclosure, for feature maps of the same scale, features are first superimposed according to convolution kernel clusters/feature types, and after enhanced features are obtained, operations such as feature splicing and feature fusion are performed.
示例性地,第一卷积核对应相同或相似的特征提取尺度,且第一卷积核中包括多个卷积核簇,同一卷积核簇中的卷积核用于提取相同的特定的结构特征,并 且将同一卷积核簇提取的第三结构特征叠加,获得最终的第一结构特征。换言之,在本公开实施例中,其是针对相同尺度的特征,以特征类型为单位,进行特征叠加,以此来增强特征的表达。例如,第一卷积核是用于提取脸部局部结构特征的卷积核,脸部局部结构特征包括眼部特征,鼻部特征和嘴部特征,其中,第一卷积核簇用于提取眼部特征,第二卷积核簇用于提取鼻部特诊,第三卷积核用于提取嘴部特征,在获得上述特征之后,针对第一卷积核簇提取的多个眼部特征进行叠加,获得最终的眼部特征,类似的,将第二卷积核簇提取的多个鼻部特征进行叠加,获得最终的鼻部特征,将第三卷积核簇提取的多个嘴部特征进行叠加,获得最终的嘴部特征。也就是说,本公开实施例在获取第一结构特征时,并非是将不同尺度的特征进行融合或拼接,而是将同一尺度、相同类型的多个特征进行叠加,获得具有增强效果的特征。Exemplarily, the first convolution kernel corresponds to the same or similar feature extraction scale, and the first convolution kernel includes multiple convolution kernel clusters, and the convolution kernels in the same convolution kernel cluster are used to extract the same specific Structural features, and the third structural features extracted by the same convolution kernel cluster are superimposed to obtain the final first structural features. In other words, in the embodiment of the present disclosure, feature superposition is performed on features of the same scale in units of feature types, so as to enhance the expression of features. For example, the first convolution kernel is a convolution kernel used to extract local structural features of the face, and the local structural features of the face include eye features, nose features and mouth features, wherein the first convolution kernel cluster is used to extract Eye features, the second convolution kernel cluster is used to extract nose features, and the third convolution kernel is used to extract mouth features. After obtaining the above features, multiple eye features extracted from the first convolution kernel cluster Perform superposition to obtain the final eye features. Similarly, superimpose the multiple nose features extracted by the second convolution kernel cluster to obtain the final nose features, and combine the multiple mouth features extracted by the third convolution kernel cluster The features are superimposed to obtain the final mouth features. That is to say, when acquiring the first structural feature, the embodiments of the present disclosure do not fuse or combine features of different scales, but superimpose multiple features of the same scale and type to obtain features with enhanced effects.
步骤S102,根据结构特征,确定待检测图像中的人体区域。Step S102, according to the structural features, determine the human body area in the image to be detected.
其中,待检测图像包括前景区域和背景区域。在人体检测应用场景中,前景区域特指人体区域,背景区域指除人体区域之外的、由物体、物品等构成的区域。在人体检测过程中,并不过多关注背景区域的特征,因此,需要将背景区域从待检测图像中剔除,或者将前景区域从待检测图像中提取出来,以针对前景区域作进一步的分析与处理。Wherein, the image to be detected includes a foreground area and a background area. In the human detection application scenario, the foreground area specifically refers to the human body area, and the background area refers to the area composed of objects, items, etc. other than the human body area. In the process of human body detection, we do not pay much attention to the characteristics of the background area. Therefore, it is necessary to remove the background area from the image to be detected, or extract the foreground area from the image to be detected, so as to further analyze and process the foreground area. .
在一些实施例中,根据结构特征,确定待检测图像中的人体区域的步骤,包括:将结构特征回归到待检测图像中,确定待检测图像中的人体区域。In some embodiments, the step of determining the human body area in the image to be detected according to the structural feature includes: returning the structural feature to the image to be detected, and determining the human body area in the image to be detected.
在一些具体实现中,结构特征包括第一结构特征和第二结构特征。将第一结构特征和第二结构特征回归到待检测图像中,获得与第一结构特征对应的第一人体区域以及与第二结构特征对应的第二人体区域。In some implementations, the structural features include a first structural feature and a second structural feature. The first structural feature and the second structural feature are returned to the image to be detected to obtain a first human body region corresponding to the first structural feature and a second human body region corresponding to the second structural feature.
针对上述实施例,以结构特征为人体的头部特征为例进行说明。如果直接将该头部特征回归到待检测图像中,其在待检测图像中的区域通常为一个规则的矩形区域,人体的头部位于该矩形区域内。换言之,通过直接回归结构特征方式确定的区域中,既包括头部区域,也包括部分背景区域,因此,无法准确地从待检测图像中框选头部区域。基于此,在将结构特征回归到待检测图像之前,先对结构特征进行过滤处理,将背景结构特征过滤掉,获得过滤结构特征。在将过滤结构特征回归到待检测图像时,可以获得只包括人体区域的回归结果,从而实现对人体结构的准确框选,为后续提取色彩特征提供区域基础。With regard to the above-mentioned embodiment, description will be made by taking a head feature whose structural feature is a human body as an example. If the head feature is directly returned to the image to be detected, its area in the image to be detected is usually a regular rectangular area, and the head of the human body is located in the rectangular area. In other words, the region determined by direct regression of structural features includes both the head region and part of the background region, so the head region cannot be accurately framed from the image to be detected. Based on this, before returning the structural features to the image to be detected, the structural features are firstly filtered to filter out the background structural features to obtain the filtered structural features. When the filtered structural features are returned to the image to be detected, the regression result including only the human body area can be obtained, so as to realize the accurate frame selection of the human body structure and provide a regional basis for the subsequent extraction of color features.
在一些实施例中,根据结构特征,确定待检测图像中的人体区域的步骤,包括:In some embodiments, according to the structural features, the step of determining the human body region in the image to be detected includes:
首先,根据预设的结构特征阈值对结构特征进行过滤处理,获得过滤结构特征,其中,结构特征阈值用于过滤结构特征中的背景结构特征;其次,将过滤结构特征回归到待检测图像中,确定待检测图像中的人体区域。其中,结构特征阈值可以根据经验、统计数据或者通过训练获得,本公开对此不作限定。First, the structural features are filtered according to the preset structural feature threshold to obtain the filtered structural features, wherein the structural feature threshold is used to filter the background structural features in the structural features; secondly, the filtered structural features are returned to the image to be detected, Determine the human body region in the image to be detected. Wherein, the structural feature threshold may be obtained according to experience, statistical data or through training, which is not limited in the present disclosure.
在一些具体实现中,结构特征阈值包括第一结构特征阈值和第二结构特征阈值。根据结构特征,确定待检测图像中的人体区域的步骤,包括:In some implementations, the structural feature threshold includes a first structural feature threshold and a second structural feature threshold. According to the structural features, the step of determining the human body area in the image to be detected includes:
首先,根据第一结构特征阈值对第一结构特征进行过滤处理,获得第一过滤结构特征;将第一过滤结构特征回归到待检测图像中,获得第一人体区域。其次,根据第二结构特征阈值对第二结构特征进行过滤处理,获得第二过滤结构特征;将第二过滤结构特征回归到待检测图像中,获得第二人体区域。First, the first structural feature is filtered according to the first structural feature threshold to obtain the first filtered structural feature; the first filtered structural feature is returned to the image to be detected to obtain the first human body region. Secondly, the second structural feature is filtered according to the second structural feature threshold to obtain the second filtered structural feature; the second filtered structural feature is returned to the image to be detected to obtain the second human body region.
需要说明的是,由于针对不同的结构特征设置了不同的结构特征阈值,使得基于结构特征阈值获取的过滤结构特征更加准确合理,从而可以获得更加准确的人体区域。It should be noted that since different structural feature thresholds are set for different structural features, the filtered structural features obtained based on the structural feature thresholds are more accurate and reasonable, thereby obtaining more accurate human body regions.
还需要说明的是,在将结构特征回归到待检测图像时,可以使用线性回归、K-近邻(K-Nearest Neighbor,K-NN)回归、决策树回归和随机森林回归等算法,本公开对此不作限定。It should also be noted that algorithms such as linear regression, K-nearest neighbor (K-Nearest Neighbor, K-NN) regression, decision tree regression and random forest regression can be used when returning structural features to the image to be detected. This is not limited.
步骤S103,提取人体区域的色彩特征。Step S103, extracting color features of the human body region.
其中,色彩特征用于表征人体区域的色彩信息。例如,色彩特征是基于图像灰度的特征,又如,色彩特征是基于RGB(Red、Green、Blue,红、绿、蓝)颜色通道的特征。Among them, the color feature is used to represent the color information of the human body area. For example, the color feature is a feature based on the grayscale of an image, and for another example, the color feature is a feature based on an RGB (Red, Green, Blue, red, green, blue) color channel.
需要说明的是,以上对于色彩特征仅是举例说明,本公开对此不作限定。It should be noted that the above color features are only examples, and the present disclosure does not limit them.
在一些实施例中,人体区域包括第一人体区域和第二人体区域,其中,第一人体区域是将第一过滤结构特征回归到待检测图像中获得的区域,第二人体区域是将第二过滤结构特征回归到待检测图像中获得的区域。提取人体区域的色彩特征的步骤,包括:In some embodiments, the human body region includes a first human body region and a second human body region, wherein the first human body region is the region obtained by returning the first filtered structural feature to the image to be detected, and the second human body region is the second human body region The filtered structural features are regressed to the regions obtained in the image to be detected. The steps of extracting the color features of the human body region include:
首先,提取第一人体区域的色彩特征,获得第一色彩特征;其次,提取第二人体区域的色彩特征,获得第二色彩特征。Firstly, the color feature of the first human body area is extracted to obtain the first color feature; secondly, the color feature of the second human body area is extracted to obtain the second color feature.
在一些其他实施例中,提取人体区域的色彩特征的步骤,包括:首先,提取第一人体区域的色彩特征,获得第一色彩特征;其次,在第一色彩特征的基础上再次进行卷积,提取第二色彩特征。换言之,在本实施例中,不使用第二结构特征获取第二色彩特征,而是对第一色彩特征进行进一步卷积获得第二色彩特征,因此,第二色彩特征相对于第一色彩特征而言,是特征层次更高、更加全局化的特征。In some other embodiments, the step of extracting the color feature of the human body region includes: firstly, extracting the color feature of the first human body region to obtain the first color feature; secondly, performing convolution again on the basis of the first color feature, Extract second color features. In other words, in this embodiment, instead of using the second structural feature to obtain the second color feature, the first color feature is further convoluted to obtain the second color feature. Therefore, the second color feature is different from the first color feature In other words, it is a higher-level feature and a more global feature.
步骤S104,根据结构特征和色彩特征,确定待检测图像的人体检测结果。Step S104, according to the structure feature and color feature, determine the human body detection result of the image to be detected.
其中,人体检测结果包括待检测图像的人体框和关键点信息。人体框表现为矩形或者正方形,其表示人体在图像中的区域范围。关键点信息包括人体关键点的坐标,在一些具体实现中,关键点对应人体的17个部位,分别是鼻子、左右眼、左右耳、左右肩、左右肘、左右腕、左右臀、左右膝和左右脚踝。Wherein, the human body detection result includes human body frame and key point information of the image to be detected. The human body frame is represented as a rectangle or a square, which represents the area range of the human body in the image. The key point information includes the coordinates of the key points of the human body. In some specific implementations, the key points correspond to 17 parts of the human body, namely the nose, left and right eyes, left and right ears, left and right shoulders, left and right elbows, left and right wrists, left and right hips, left and right knees and Left and right ankles.
在一些实施例中,根据结构特征和色彩特征,确定待检测图像的人体检测结果的步骤,包括:In some embodiments, according to the structural features and color features, the step of determining the human body detection result of the image to be detected includes:
首先,连接第一结构特征和第一色彩特征,获得第一连接特征;其次,连接第二结构特征和第二色彩特征,获得第二连接特征;最后,基于预设的激活函数对第一连接特征和第二连接特征进行激活处理,获得待检测图像的人体检测结果。First, connect the first structural feature and the first color feature to obtain the first connection feature; secondly, connect the second structural feature and the second color feature to obtain the second connection feature; finally, based on the preset activation function, the first connection The feature and the second connection feature are activated to obtain the human body detection result of the image to be detected.
其中,可以通过连接(Concat)函数连接结构特征和色彩特征。激活函数包括但不限于S型函数(Sigmoid)、双曲正切函数(Tanh)、线性整流函数(Rectified Linear Unit,ReLU)。Among them, the structure feature and the color feature can be connected through the connection (Concat) function. Activation functions include, but are not limited to, Sigmoid, Tanh, and ReLU.
需要说明的是,在一些实施例中,在确定待检测图像的人体检测结果之后,还包括:It should be noted that, in some embodiments, after determining the human body detection result of the image to be detected, the method further includes:
首先,根据待检测图像的人体框和关键点信息,确定人体姿态信息;其次,在根据人体姿态信息确定发生预设预警事件的情况下,发出预警信号。First, determine the human body posture information according to the human body frame and key point information of the image to be detected; secondly, send an early warning signal when a preset early warning event occurs according to the human body posture information.
例如,在车站、车厢或者其他公共场所内,对监控视频进行人体检测,获得人体检测结果,并在根据人体框以及关键点信息确定人体姿态为摔倒姿态时,获知发生人员摔倒事件,因此,可以向工作人员终端或广播终端发送预警信号,以使相关工作人员及时进行应急处理或启动应急预案。For example, in stations, carriages or other public places, human body detection is performed on surveillance video to obtain human body detection results, and when the human body posture is determined to be a falling posture according to the human body frame and key point information, it is known that a person falls. , can send an early warning signal to the staff terminal or broadcast terminal, so that the relevant staff can carry out emergency treatment or start the emergency plan in time.
在本实施例中,提取待检测图像的结构特征;根据结构特征,确定待检测图像中的人体区域;提取人体区域的色彩特征;根据结构特征和色彩特征,确定待检测图像的人体检测结果,人体检测结果包括待检测图像的人体框和关键点信息。该方法使用结构特征和色彩特征共同进行人体检测,从而可以获得准确度较高的人体检测结果,而且该方法对应的人体检测模型使用具有人体框的图像即可训练获得,无需使用具有关键点坐标标记的图像进行训练,避免人工标注关键点坐标。In this embodiment, extract the structural features of the image to be detected; determine the human body area in the image to be detected according to the structural features; extract the color features of the human body area; determine the human body detection result of the image to be detected according to the structural features and color features, The result of human detection includes human frame and key point information of the image to be detected. This method uses structural features and color features to detect human bodies together, so that high-accuracy human body detection results can be obtained, and the human body detection model corresponding to this method can be trained using images with human body frames, without using key point coordinates Marked images are used for training to avoid manual labeling of key point coordinates.
图2是本公开实施例提供的一种结构特征提取方法的流程图。如图2所示,该结构特征提取方法包括如下步骤:Fig. 2 is a flow chart of a method for extracting structural features provided by an embodiment of the present disclosure. As shown in Figure 2, the structural feature extraction method includes the following steps:
步骤S201,通过多个卷积核簇,分别对待检测图像进行特征提取,得到第三结构特征。In step S201, feature extraction is performed on the image to be detected through multiple convolution kernel clusters to obtain a third structural feature.
其中,第一卷积核包括多个卷积核簇,每个卷积核簇包括至少一个卷积核。第三结构特征为每个卷积核对待检测图像进行特征提取所获得的结构特征,其与卷积核数量相等(单通道情况)。Wherein, the first convolution kernel includes multiple convolution kernel clusters, and each convolution kernel cluster includes at least one convolution kernel. The third structural feature is the structural feature obtained by feature extraction of the image to be detected by each convolution kernel, which is equal to the number of convolution kernels (in the case of a single channel).
在一些实施例中,为全面准确地从待检测图像中提取结构特征,因此,设置多个第一卷积核。这些卷积核通过聚类操作被划分为多个卷积核簇,每个卷积核簇中包括至少一个卷积核。对于各个卷积核簇,归属于该卷积核簇的卷积核之间具有较高的相似性,具体表现在这些卷积核在提取某一类结构特征时,具有较好的提取效果。In some embodiments, in order to fully and accurately extract structural features from the image to be detected, multiple first convolution kernels are set. These convolution kernels are divided into multiple convolution kernel clusters through a clustering operation, and each convolution kernel cluster includes at least one convolution kernel. For each convolution kernel cluster, the convolution kernels belonging to the convolution kernel cluster have high similarity, which is specifically manifested in that these convolution kernels have better extraction effects when extracting a certain type of structural features.
步骤S202,将与同一卷积核簇对应的第三结构特征叠加,得到与卷积核簇对应的结构特征。Step S202, superimposing the third structural features corresponding to the same convolution kernel cluster to obtain the structural features corresponding to the convolution kernel cluster.
其中,第一结构特征包括与多个卷积核簇对应的结构特征。Wherein, the first structural feature includes structural features corresponding to multiple convolution kernel clusters.
在一些实施例中,将同一卷积核簇的卷积核提取的第三结构特征叠加,获得与该卷积核簇对应的结构特征。In some embodiments, the third structural features extracted by the convolution kernels of the same convolution kernel cluster are superimposed to obtain the structural features corresponding to the convolution kernel cluster.
由于同一卷积核簇对应的第三结构特征针对某些结构特征表现较好,因此,通过上述叠加操作,使得结构特征实现了增强效果,从而获得效果更好的结构特征。Since the third structural feature corresponding to the same convolution kernel cluster performs better for certain structural features, through the above superposition operation, the structural features can be enhanced, thereby obtaining better structural features.
步骤S203,基于预设的第二卷积核对第一结构特征进行特征提取,获得第二结构特征。Step S203, performing feature extraction on the first structural feature based on the preset second convolution kernel to obtain the second structural feature.
例如,第一卷积核包括100个卷积核,这些卷积核被划分到第一卷积核簇、第二卷积核簇和第三卷积核簇中,其中,归属于第一卷积核簇的卷积核在提取肘部特征时提取效果较好,归属于第二卷积核簇的卷积核在提取腕部特征时提取效果较好,归属于第三 卷积核簇的卷积核在提取头部特征时提取效果较好。在使用上述100个卷积核对待检测图像进行特征提取时,获得100个第三结构特征。For example, the first convolution kernel includes 100 convolution kernels, and these convolution kernels are divided into the first convolution kernel cluster, the second convolution kernel cluster and the third convolution kernel cluster, wherein, belonging to the first volume The convolution kernel of the convolution kernel cluster has a better extraction effect when extracting elbow features, the convolution kernel belonging to the second convolution kernel cluster has a better extraction effect when extracting wrist features, and the convolution kernel belonging to the third convolution kernel cluster has a better extraction effect. The convolution kernel has a better extraction effect when extracting head features. When using the aforementioned 100 convolution kernels to perform feature extraction on the image to be detected, 100 third structural features are obtained.
针对第一卷积核簇,将归属于其中的卷积核所提取的第三结构特征进行叠加,获得第一卷积核簇对应的第一结构特征;针对第二卷积核簇,将归属于其中的卷积核所提取的第三结构特征进行叠加,获得第二卷积核簇对应的第一结构特征;针对第三卷积核簇,将归属于其中的卷积核所提取的第三结构特征进行叠加,获得第三卷积核簇对应的第一结构特征。For the first convolution kernel cluster, the third structural features extracted by the convolution kernels belonging to it are superimposed to obtain the first structural features corresponding to the first convolution kernel cluster; for the second convolution kernel cluster, the belonging The third structural feature extracted by the convolution kernel is superimposed to obtain the first structural feature corresponding to the second convolution kernel cluster; for the third convolution kernel cluster, the first convolution kernel extracted by the convolution kernel belonging to it is The three structural features are superimposed to obtain the first structural feature corresponding to the third convolution kernel cluster.
在本实施例中,由于第一卷积核簇提取肘部特征效果较好,因此,较单个第三结构特征而言,第一卷积核簇对应的第一结构特征可以更好地反映肘部特征。类似地,第二卷积核簇对应的第一结构特征可以更好地反映腕部特征,第三卷积核簇对应的第一结构特征可以更好地反映头部特征。In this embodiment, since the first convolution kernel cluster has a better effect in extracting elbow features, the first structural feature corresponding to the first convolution kernel cluster can better reflect the elbow feature than the single third structural feature. internal features. Similarly, the first structural feature corresponding to the second convolution kernel cluster can better reflect the wrist feature, and the first structural feature corresponding to the third convolution kernel cluster can better reflect the head feature.
需要说明的是,在一些具体实现中,本公开实施例提供的人体检测方法可以通过预设的人体检测模型实现。其中,人体检测模型包括基于神经网络构建的模型。It should be noted that, in some specific implementations, the human body detection method provided by the embodiments of the present disclosure may be implemented by a preset human body detection model. Among them, the human body detection model includes a model constructed based on a neural network.
在一些实施例中,提取待检测图像的结构特征之前,还包括:In some embodiments, before extracting the structural features of the image to be detected, it also includes:
通过预设的训练集对人体检测模型进行训练,其中,训练集中包括样本图像及样本图像的人体框标注信息。The human body detection model is trained through a preset training set, wherein the training set includes sample images and human frame annotation information of the sample images.
在相关技术中,训练人体检测模型所使用的训练集中,包括样本图像及样本图像的关键点坐标标注信息。通过模型训练,使人体检测模型学习到关键点坐标标注能力,从而为待检测图像进行关键点坐标标记。但是,对样本图像进行关键点坐标标注通常依赖于人工标注,操作复杂,耗费大量时间和人力。In related technologies, the training set used for training the human detection model includes sample images and key point coordinate labeling information of the sample images. Through model training, the human body detection model can learn the key point coordinate labeling ability, so as to mark the key point coordinates for the image to be detected. However, the key point coordinate labeling of sample images usually relies on manual labeling, which is complex and takes a lot of time and manpower.
在本公开实施例中,使用包括样本图像及样本图像的人体框标注信息的训练集进行模型训练,模型在训练过程中学习人体框标注能力。而人体检测模型实现人体框标注,依赖于对图像中特征的识别与提取,当识别和提取的特征越准确时,其获得的人体框标注越准确,相应地,基于该特征所确定的关键点坐标也就越准确。换言之,本公开实施例提供的人体检测模型,使用包括样本图像及样本图像的人体框标注信息作为训练集进行训练即可,同样可以获得关键点坐标标注能力,且无需对样本图像进行关键点坐标标记,简化了操作复杂度,节省了大量的时间和人力。In the embodiments of the present disclosure, model training is performed using a training set including sample images and human body frame labeling information of the sample images, and the model learns human body frame labeling capabilities during the training process. The human body detection model realizes human body frame labeling, which relies on the recognition and extraction of features in the image. When the recognized and extracted features are more accurate, the obtained human body frame labeling is more accurate. Correspondingly, the key points determined based on this feature The coordinates are more accurate. In other words, the human body detection model provided by the embodiments of the present disclosure can be trained by using the sample image and the body frame annotation information of the sample image as the training set, and can also obtain the key point coordinate labeling ability, and there is no need to perform key point coordinates on the sample image. Marking simplifies the operation complexity and saves a lot of time and manpower.
图3为本公开实施例提供的一种人体检测模型训练方法的流程图。FIG. 3 is a flowchart of a human body detection model training method provided by an embodiment of the present disclosure.
步骤S301,将训练集输入初始的人体检测模型,通过第一卷积网络提取样本图像的细节特征。Step S301, input the training set into the initial human body detection model, and extract the detailed features of the sample image through the first convolutional network.
其中,训练集中包括样本图像及样本图像的人体框标注信息。换言之,训练人体检测模型所使用的样本图像是经过人体框标注的图像。第一卷积网络包括多个卷积层(例如,3个卷积层),其用于提取低层的结构特征(即细节特征),细节特征包括纹理特征等。Wherein, the training set includes the sample image and the human frame annotation information of the sample image. In other words, the sample images used for training the human body detection model are images labeled with body frames. The first convolutional network includes a plurality of convolutional layers (for example, 3 convolutional layers), which are used to extract low-level structural features (ie, detail features), and the detail features include texture features and the like.
步骤S302,通过第二卷积网络提取样本图像的第一结构特征,根据第一结构特征阈值对第一结构特征进行过滤处理,获得第一过滤结构特征。Step S302, extracting the first structural feature of the sample image through the second convolutional network, and filtering the first structural feature according to the first structural feature threshold to obtain the first filtered structural feature.
步骤S303,将第一过滤结构特征回归到样本图像中,确定第一人体区域,提取第一人体区域的色彩特征,获得第一色彩特征。Step S303, returning the first filter structure feature to the sample image, determining the first human body region, extracting the color features of the first human body region, and obtaining the first color feature.
步骤S304,使用第三卷积网络对第一结构特征进行全局特征提取,获得第二结构特征,根据第二结构特征阈值对第二结构特征进行过滤处理,获得第二过滤结构特征。Step S304, using the third convolutional network to perform global feature extraction on the first structural feature to obtain the second structural feature, and filter the second structural feature according to the second structural feature threshold to obtain the second filtered structural feature.
步骤S305,将第二过滤结构特征回归到样本图像中,确定第二人体区域,提取第二人体区域的色彩特征,获得第二色彩特征。Step S305, returning the second filter structure feature to the sample image, determining the second human body area, extracting the color feature of the second human body area, and obtaining the second color feature.
步骤S306,通过连接层连接第一结构特征和第一色彩特征,获得第一连接特征,通过连接层连接第二结构特征和第二色彩特征,获得第二连接特征。Step S306, connecting the first structural feature and the first color feature through the connection layer to obtain the first connection feature, and connecting the second structural feature and the second color feature through the connection layer to obtain the second connection feature.
步骤S307,将第一连接特征和第二连接特征输入激活层,通过激活处理,获得样本图像的人体检测结果。Step S307, input the first connection feature and the second connection feature into the activation layer, and obtain the human body detection result of the sample image through activation processing.
其中,人体检测结果包括样本图像的人体框和关键点信息。Wherein, the human body detection result includes the body frame and key point information of the sample image.
步骤S308,根据人体检测结果对人体检测模型的参数进行调整,使用调整后的人体检测模型进行迭代训练,直到满足预设的停止条件时,停止模型训练。Step S308, adjust the parameters of the human body detection model according to the human body detection result, and use the adjusted human body detection model to perform iterative training until the preset stop condition is met, then stop the model training.
其中,停止条件可以是关于检测准确度和/或训练次数相关的条件,本公开对此不作限定。停止训练之后所获得的人体检测模型即认为是符合要求的模型,可以基于该人体检测模型进行人体检测。Wherein, the stop condition may be a condition related to detection accuracy and/or training times, which is not limited in the present disclosure. The human body detection model obtained after the training is stopped is regarded as a model that meets the requirements, and human body detection can be performed based on the human body detection model.
需要说明的是,在一些实施例中,在经过多次训练之后,还可以通过卷积核聚类提高训练获得的模型的检测准确度。具体地:It should be noted that, in some embodiments, after multiple times of training, the detection accuracy of the trained model can also be improved by clustering convolution kernels. specifically:
首先,对第二卷积网络的卷积核进行聚类,获得卷积核簇。可选地,在一些具体体现中,仅对第二卷积网络的首个卷积层对应的卷积核进行聚类,使得相似的卷积核被聚类到同一卷积核簇中。其次,将与同一卷积核簇对应的第一结构特征叠加,得到与卷积核簇对应的增强的第一结构特征,并使用增强的第一结构特征提取第一色彩特征。再次,使用增强的第一结构特征确定第二结构特征,并根据第二结构特征提取第二色彩特征。然后,通过连接层,连接第一结构特征和第一色彩特征,获得第一连接特征,连接第二结构特征和第二色彩特征,获得第二连接特征。最后,将第一连接特征和第二连接特征输入激活层,通过激活处理,获得样本图像的人体检测结果,根据人体检测结果对人体检测模型的参数进行调整,使用调整后的人体检测模型再次进行迭代训练,当满足预设的停止条件时,停止模型训练,获得训练好的人体检测模型。其中,通过聚类获得的卷积核簇中,存在部分卷积核簇,其对结构特征提取效果较差(例如,主要提取噪声特征的卷积核簇),对于这部分卷积核簇,可以将其过滤掉,以提升结构特征提取准确度。First, the convolution kernels of the second convolutional network are clustered to obtain convolution kernel clusters. Optionally, in some embodiments, only the convolution kernels corresponding to the first convolution layer of the second convolutional network are clustered, so that similar convolution kernels are clustered into the same convolution kernel cluster. Secondly, the first structural features corresponding to the same convolution kernel cluster are superimposed to obtain the enhanced first structural features corresponding to the convolution kernel cluster, and the first color features are extracted using the enhanced first structural features. Again, the enhanced first structural feature is used to determine the second structural feature, and the second color feature is extracted according to the second structural feature. Then, through the connection layer, the first structural feature and the first color feature are connected to obtain the first connection feature, and the second structural feature and the second color feature are connected to obtain the second connection feature. Finally, the first connection feature and the second connection feature are input into the activation layer, and the human detection result of the sample image is obtained through activation processing, and the parameters of the human detection model are adjusted according to the human detection result, and the adjusted human detection model is used again. Iterative training, when the preset stop condition is met, the model training is stopped, and a trained human detection model is obtained. Among them, among the convolution kernel clusters obtained by clustering, there are some convolution kernel clusters, which are less effective in extracting structural features (for example, convolution kernel clusters that mainly extract noise features), for this part of convolution kernel clusters, It can be filtered out to improve the accuracy of structural feature extraction.
通过卷积核聚类及卷积核过滤操作,可以减少模型参数,并提升模型的泛化能力,使得人体检测模型对于不同应用场景、不同类型图片均能获得良好的人体检测结果。Through convolution kernel clustering and convolution kernel filtering operations, model parameters can be reduced, and the generalization ability of the model can be improved, so that the human body detection model can obtain good human body detection results for different application scenarios and different types of pictures.
图4为本公开实施例提供的一种人体检测模型的训练过程示意图。如图4所示,训练集中包括多个样本图像,样本图像标注有人体框。将训练集输入人体检测模型之后,首先通过卷积操作提取样本图像的细节特征,在细节特征的基础上进一步提取第一结构特征,并根据第一结构特征和第一结构特征阈值,从样本图像中提取第一色彩特征。在 第一结构特征的基础上,进一步提取第二结构特征,根据第二结构特征和第二结构特征阈值,从样本图像中提取全局色彩特征。Fig. 4 is a schematic diagram of a training process of a human body detection model provided by an embodiment of the present disclosure. As shown in Figure 4, the training set includes multiple sample images, and the sample images are marked with human frame. After the training set is input into the human body detection model, the detailed features of the sample image are first extracted through convolution operation, and the first structural features are further extracted on the basis of the detailed features, and according to the first structural feature and the threshold of the first structural feature, from the sample image Extract the first color feature. On the basis of the first structural feature, the second structural feature is further extracted, and the global color feature is extracted from the sample image according to the second structural feature and the second structural feature threshold.
通过连接层,将第一结构特征和第一色彩特征连接,构成第一连接特征,并将第二结构特征和第二色彩特征连接,构成第二连接特征,然后将第一连接特征和第二连接特征输入到激活层,由预设的激活函数进行处理,获得训练结果。Through the connection layer, the first structural feature and the first color feature are connected to form the first connection feature, and the second structural feature and the second color feature are connected to form the second connection feature, and then the first connection feature and the second The connection features are input to the activation layer, processed by the preset activation function, and the training results are obtained.
需要说明的是,在经过多次训练之后,可以对提取第一结构特征的卷积核进行聚类,获得卷积核簇。归属于同一卷积核簇中的卷积核具有较高的相似性,具体表现在这些卷积核在提取某一类结构特征时,具有较好的提取效果。因此,可以将同一卷积核簇中的卷积核提取的第一结构特征进行叠加,获得具有特征增强效果的第一结构特征,以基于该第一结构特征提取到准确性更高的第二结构特征。It should be noted that after several times of training, the convolution kernels for extracting the first structural features can be clustered to obtain convolution kernel clusters. The convolution kernels belonging to the same convolution kernel cluster have high similarity, which is specifically manifested in that these convolution kernels have better extraction effects when extracting a certain type of structural features. Therefore, the first structural features extracted by the convolution kernels in the same convolution kernel cluster can be superimposed to obtain the first structural features with feature enhancement effect, so as to extract the second structural features with higher accuracy based on the first structural features. Structure.
在这些卷积核簇中,可能存在部分卷积核簇对结构特征的提取效果不好(例如,主要提取噪声特征的卷积核簇),为提高结构特征的提取效果,可以将这部分卷积核簇过滤掉,过滤卷积核的作用类似于“模型剪枝”操作。In these convolution kernel clusters, there may be some convolution kernel clusters that are not effective in extracting structural features (for example, convolution kernel clusters that mainly extract noise features). In order to improve the extraction effect of structural features, this part of convolution kernel clusters can be The filter cluster is filtered out, and the function of the filter convolution kernel is similar to the "model pruning" operation.
上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本公开的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该申请的保护范围内。The step division of the above various methods is only for the sake of clarity of description. During implementation, they can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the protection scope of the present disclosure ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this application.
本公开第二方面提供一种人体检测装置。图5是本公开实施例提供的一种人体检测装置的组成方框图。如图5所示,该人体检测装置500包括:The second aspect of the present disclosure provides a human body detection device. Fig. 5 is a block diagram of a human body detection device provided by an embodiment of the present disclosure. As shown in Figure 5, the human body detection device 500 includes:
第一提取模块501,被配置为提取待检测图像的结构特征,结构特征用于表征待检测图像的结构信息。The first extraction module 501 is configured to extract structural features of the image to be detected, and the structural features are used to characterize structural information of the image to be detected.
其中,结构特征用于表征待检测图像的结构信息。例如,待检测图像中包括人体时,其结构特征可以是人体的头部、肘部、关节、足部等结构性特征。在一些具体实现中,可以通过卷积方式从待检测图像中提取结构特征。Among them, the structural features are used to characterize the structural information of the image to be detected. For example, when the image to be detected includes a human body, its structural features may be structural features such as the head, elbow, joint, and foot of the human body. In some specific implementations, structural features can be extracted from the image to be detected by convolution.
在一些实施例中,待检测图像的结构特征包括第一结构特征和第二结构特征,第一提取模块501包括第一提取单元和第二提取单元。其中,第一提取单元,用于基于预设的第一卷积核对待检测图像进行特征提取,获得第一结构特征;第二提取单元,用于基于预设的第二卷积核对第一结构特征进行特征提取,获得第二结构特征。In some embodiments, the structural features of the image to be detected include first structural features and second structural features, and the first extraction module 501 includes a first extraction unit and a second extraction unit. Among them, the first extraction unit is used to perform feature extraction on the image to be detected based on the preset first convolution kernel to obtain the first structural features; the second extraction unit is used to extract the first structure based on the preset second convolution kernel. Features feature extraction to obtain the second structural features.
其中,卷积核可以视作滤波器矩阵,其用于从被卷积的图像中提取特征。在本实施例中,第二结构特征是对第一结构特征进行进一步卷积获得的特征,其相对于第一结构特征而言,是特征层次更高、更加全局化的特征。Among them, the convolution kernel can be regarded as a filter matrix, which is used to extract features from the convolved image. In this embodiment, the second structural feature is a feature obtained by further convoluting the first structural feature, which is a higher-level feature and more global than the first structural feature.
例如,第一结构特征包括眼部特征、鼻部特征和嘴部特征,则第二结构特征为人脸特征。又如,第一结构特征包括待检测图像中某人物的头部特征、肘部特征、手部特征、腿部特征和足部特征,第二结构特征为该人物的整体结构特征。For example, the first structural feature includes eye features, nose features, and mouth features, and the second structural feature is a human face feature. For another example, the first structural feature includes the head feature, elbow feature, hand feature, leg feature and foot feature of a certain person in the image to be detected, and the second structural feature is the overall structural feature of the person.
需要说明的是,以上对于第一结构特征和第二结构特征仅是举例说明,可根据实际需求进行灵活设置,本公开对此不作限定。It should be noted that the above-mentioned first structural feature and second structural feature are only examples, and can be flexibly set according to actual needs, which is not limited in the present disclosure.
区域确定模块502,被配置为根据结构特征,确定待检测图像中的人体区域。The area determining module 502 is configured to determine the human body area in the image to be detected according to the structural features.
其中,待检测图像包括前景区域和背景区域。在人体检测应用场景中,前景区域特指人体区域,背景区域指除人体区域之外的、由物体、物品等构成的区域。在人体检测过程中,并不过多关注背景区域的特征,因此,需要将背景区域从待检测图像中剔除,或者将前景区域从待检测图像中提取出来,以对前景区域作进一步的分析与处理。Wherein, the image to be detected includes a foreground area and a background area. In the human detection application scenario, the foreground area specifically refers to the human body area, and the background area refers to the area composed of objects, items, etc. other than the human body area. In the process of human body detection, we do not pay much attention to the characteristics of the background area. Therefore, it is necessary to remove the background area from the image to be detected, or extract the foreground area from the image to be detected, so as to further analyze and process the foreground area. .
在一些实施例中,区域确定模块502包括回归单元。其中,回归单元,用于将结构特征回归到待检测图像中,确定待检测图像中的人体区域。In some embodiments, the area determination module 502 includes a regression unit. Wherein, the regression unit is used for regressing structural features into the image to be detected, and determining the human body area in the image to be detected.
在一些具体实现中,结构特征包括第一结构特征和第二结构特征。基于回归单元,将第一结构特征和第二结构特征分别回归到待检测图像中,获得与第一结构特征对应的第一人体区域以及与第二结构特征对应的第二人体区域。In some implementations, the structural features include a first structural feature and a second structural feature. Based on the regression unit, the first structural feature and the second structural feature are respectively regressed into the image to be detected to obtain a first human body region corresponding to the first structural feature and a second human body region corresponding to the second structural feature.
针对上述实施例,以结构特征为人体的头部特征为例进行说明。如果直接将该头部特征回归到待检测图像中,其在待检测图像中的区域通常为一个规则的矩形区域,人体的头部位于该矩形区域内。换言之,通过直接回归结构特征方式确定的区域中,既包括头部区域,也包括部分背景区域,因此,无法准确地从待检测图像中框选头部区域。基于此,在将结构特征回归到待检测图像之前,先对结构特征进行过滤处理,将背景结构特征过滤掉,获得过滤结构特征。在将过滤结构特征回归到待检测图像时,可以获得只包括人体区域的回归结果,从而实现对人体结构的准确框选,为后续提取色彩特征提供区域基础。With regard to the above-mentioned embodiment, description will be made by taking a head feature whose structural feature is a human body as an example. If the head feature is directly returned to the image to be detected, its area in the image to be detected is usually a regular rectangular area, and the head of the human body is located in the rectangular area. In other words, the region determined by direct regression of structural features includes both the head region and part of the background region, so the head region cannot be accurately framed from the image to be detected. Based on this, before returning the structural features to the image to be detected, the structural features are firstly filtered to filter out the background structural features to obtain the filtered structural features. When the filtered structural features are returned to the image to be detected, the regression result including only the human body area can be obtained, so as to realize the accurate frame selection of the human body structure and provide a regional basis for the subsequent extraction of color features.
在一些实施例中,区域确定模块502还包括过滤单元。其中,过滤单元,用于根据预设的结构特征阈值对结构特征进行过滤处理,获得过滤结构特征,其中,结构特征阈值用于过滤结构特征中的背景结构特征;回归单元,还用于将过滤结构特征回归到待检测图像中,确定待检测图像中的人体区域。其中,结构特征阈值可以根据经验、统计数据或者通过训练获得,本公开对此不作限定。In some embodiments, the area determining module 502 further includes a filtering unit. Wherein, the filtering unit is used to filter the structural features according to the preset structural feature threshold to obtain the filtered structural features, wherein the structural feature threshold is used to filter the background structural features in the structural features; the regression unit is also used to filter The structural features are returned to the image to be detected, and the human body area in the image to be detected is determined. Wherein, the structural feature threshold may be obtained according to experience, statistical data or through training, which is not limited in the present disclosure.
在一些具体实现中,结构特征阈值包括第一结构特征阈值和第二结构特征阈值。过滤单元,具体用于根据第一结构特征阈值对第一结构特征进行过滤处理,获得第一过滤结构特征;回归单元,具体用于将第一过滤结构特征回归到待检测图像中,获得第一人体区域。过滤单元,具体用于根据第二结构特征阈值对第二结构特征进行过滤处理,获得第二过滤结构特征;回归单元,具体用于将第二过滤结构特征回归到待检测图像中,获得第二人体区域。In some implementations, the structural feature threshold includes a first structural feature threshold and a second structural feature threshold. The filtering unit is specifically used to filter the first structural feature according to the first structural feature threshold to obtain the first filtered structural feature; the regression unit is specifically used to return the first filtered structural feature to the image to be detected to obtain the first filtered structural feature. body area. The filtering unit is specifically used to filter the second structural feature according to the second structural feature threshold to obtain the second filtered structural feature; the regression unit is specifically used to return the second filtered structural feature to the image to be detected to obtain the second filtered structural feature. body area.
需要说明的是,由于针对不同的结构特征设置了不同的结构特征阈值,使得基于结构特征阈值获取的过滤结构特征更加准确合理,从而可以获得更加准确的人体区域。It should be noted that since different structural feature thresholds are set for different structural features, the filtered structural features obtained based on the structural feature thresholds are more accurate and reasonable, thereby obtaining more accurate human body regions.
还需要说明的是,将结构特征回归到待检测图像时,可以使用线性回归、K-近邻回归、决策树回归和随机森林回归等算法,本公开对此不作限定。It should also be noted that algorithms such as linear regression, K-nearest neighbor regression, decision tree regression, and random forest regression may be used when regressing structural features to the image to be detected, which is not limited in the present disclosure.
第二提取模块503,被配置为提取人体区域的色彩特征,色彩特征用于表征人体区域的色彩信息。The second extraction module 503 is configured to extract color features of the human body region, and the color features are used to represent color information of the human body region.
其中,色彩特征用于表征人体区域的色彩信息。例如,色彩特征是基于图像灰度的 特征,又如,色彩特征是基于RGB颜色通道的特征。Among them, the color feature is used to represent the color information of the human body area. For example, the color feature is a feature based on image grayscale, and for another example, the color feature is a feature based on RGB color channels.
需要说明的是,以上对于色彩特征仅是举例说明,本公开对此不作限定。It should be noted that the above color features are only examples, and the present disclosure does not limit them.
在一些实施例中,人体区域包括第一人体区域和第二人体区域,其中,第一人体区域是将第一过滤结构特征回归到待检测图像中获得的区域,第二人体区域是将第二过滤结构特征回归到待检测图像中获得的区域。第二提取模块503包括第三提取单元和第四提取单元。其中,第三提取单元,用于提取第一人体区域的色彩特征,获得第一色彩特征;第四提取单元,用于提取第二人体区域的色彩特征,获得第二色彩特征。In some embodiments, the human body region includes a first human body region and a second human body region, wherein the first human body region is the region obtained by returning the first filtered structural feature to the image to be detected, and the second human body region is the second human body region The filtered structural features are regressed to the regions obtained in the image to be detected. The second extraction module 503 includes a third extraction unit and a fourth extraction unit. Wherein, the third extraction unit is used to extract the color features of the first human body area to obtain the first color features; the fourth extraction unit is used to extract the color features of the second human body area to obtain the second color features.
在一些其他实施例中,第二提取模块503包括第三提取单元和第五提取单元。其中,第三提取单元,用于提取第一人体区域的色彩特征,获得第一色彩特征;第五提取单元,用于在第一色彩特征的基础上再次进行卷积,提取第二色彩特征。换言之,在本实施例中,不使用第二结构特征获取第二色彩特征,而是对第一色彩特征进行进一步卷积获得第二色彩特征,因此,第二色彩特征相对于第一色彩特征而言,是特征层次更高、更加全局化的特征。In some other embodiments, the second extraction module 503 includes a third extraction unit and a fifth extraction unit. Wherein, the third extraction unit is used to extract the color feature of the first human body area to obtain the first color feature; the fifth extraction unit is used to perform convolution again on the basis of the first color feature to extract the second color feature. In other words, in this embodiment, instead of using the second structural feature to obtain the second color feature, the first color feature is further convoluted to obtain the second color feature. Therefore, the second color feature is different from the first color feature In other words, it is a higher-level feature and a more global feature.
检测模块504,被配置为根据结构特征和色彩特征,确定待检测图像的人体检测结果,人体检测结果包括待检测图像的人体框和关键点信息。The detection module 504 is configured to determine the human body detection result of the image to be detected according to the structure feature and the color feature, and the human body detection result includes the body frame and key point information of the image to be detected.
其中,人体检测结果包括待检测图像的人体框和关键点信息。人体框表现为矩形或者正方形,其表示人体在图像中的区域范围。关键点信息包括人体关键点的坐标,在一些具体实现中,关键点对应人体的17个部位,分别是鼻子、左右眼、左右耳、左右肩、左右肘、左右腕、左右臀、左右膝和左右脚踝。Wherein, the human body detection result includes human body frame and key point information of the image to be detected. The human body frame is represented as a rectangle or a square, which represents the area range of the human body in the image. The key point information includes the coordinates of the key points of the human body. In some specific implementations, the key points correspond to 17 parts of the human body, namely the nose, left and right eyes, left and right ears, left and right shoulders, left and right elbows, left and right wrists, left and right hips, left and right knees and Left and right ankles.
在一些实施例中,检测模块504包括连接单元和激活单元。其中,连接单元,用于连接第一结构特征和第一色彩特征,获得第一连接特征;连接单元,还用于连接第二结构特征和第二色彩特征,获得第二连接特征;激活单元,用于基于预设的激活函数对第一连接特征和第二连接特征进行激活处理,获得待检测图像的人体检测结果。In some embodiments, the detection module 504 includes a connection unit and an activation unit. Wherein, the connection unit is used to connect the first structural feature and the first color feature to obtain the first connection feature; the connection unit is also used to connect the second structural feature and the second color feature to obtain the second connection feature; the activation unit, It is used for activating the first connection feature and the second connection feature based on a preset activation function to obtain a human body detection result of the image to be detected.
其中,可以通过Concat函数连接结构特征和色彩特征。激活函数包括但不限于Sigmoid函数、Tanh函数和ReLU函数。Among them, the structural features and color features can be connected by the Concat function. Activation functions include, but are not limited to, Sigmoid functions, Tanh functions, and ReLU functions.
需要说明的是,在一些实施例中,人体检测装置还包括预警模块。预警模型包括姿态确定单元和预警信号发射单元。具体地,姿态确定单元,用于在确定待检测图像的人体检测结果之后,根据待检测图像的人体框和关键点信息,确定人体姿态信息;预警信号发射单元,用于在根据人体姿态信息确定发生预设预警事件的情况下,发出预警信号。It should be noted that, in some embodiments, the human body detection device further includes an early warning module. The early warning model includes an attitude determining unit and an early warning signal transmitting unit. Specifically, the posture determination unit is used to determine the human body posture information according to the human body frame and key point information of the image to be detected after determining the human body detection result of the image to be detected; In the event of a preset early warning event, an early warning signal is issued.
例如,在车站、车厢或者公共场所内,对监控视频进行人体检测,获得人体检测结果,并在根据人体框以及关键点信息确定人体姿态为摔倒姿态时,获知发生人员摔倒事件,因此,可以向工作人员终端或广播终端发送预警信号,以使相关工作人员及时进行应急处理或启动应急预案。For example, in stations, carriages or public places, human body detection is performed on surveillance video to obtain human body detection results, and when the human body posture is determined to be a falling posture according to the human body frame and key point information, it is known that a person falls. Therefore, Early warning signals can be sent to staff terminals or broadcast terminals, so that relevant staff can carry out emergency treatment or start emergency plans in time.
还需要说明的是,本实施例公开的人体检测装置,可以部署或运行在物理服务器、虚拟服务器以及各种电子终端内,本公开对此不作限定。It should also be noted that the human body detection device disclosed in this embodiment can be deployed or run in physical servers, virtual servers and various electronic terminals, which is not limited in the present disclosure.
在本实施例中,通过第一提取模块提取待检测图像的结构特征;由区域确定模块根 据结构特征,确定待检测图像中的人体区域;第二提取模块提取人体区域的色彩特征;并通过检测模块根据结构特征和色彩特征,确定待检测图像的人体检测结果。该装置使用结构特征和色彩特征共同进行人体检测,可以获得准确度较高的人体检测结果,而且该装置对应的人体检测模型使用具有人体框的图像即可训练获得,无需使用具有关键点坐标标记的图像进行训练,避免人工标注关键点坐标。In this embodiment, the structural features of the image to be detected are extracted by the first extraction module; the area determination module determines the human body area in the image to be detected according to the structural features; the second extraction module extracts the color features of the human body area; and through detection The module determines the human body detection result of the image to be detected according to the structural features and color features. The device uses structural features and color features to perform human body detection together, and can obtain human body detection results with high accuracy, and the human body detection model corresponding to the device can be obtained by training using images with human body frames, without using key point coordinate markers The images are trained to avoid manual labeling of key point coordinates.
图6为本公开实施例提供的一种电子设备的框图。Fig. 6 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
参照图6,本公开实施例提供了一种电子设备,该电子设备包括:至少一个处理器601;至少一个存储器602,以及一个或多个I/O接口603,连接在处理器601与存储器602之间;其中,存储器602存储有可被至少一个处理器601执行的一个或多个计算机程序,一个或多个计算机程序被至少一个处理器601执行,以使至少一个处理器601能够执行上述的人体检测方法。Referring to FIG. 6 , an embodiment of the present disclosure provides an electronic device, which includes: at least one processor 601; at least one memory 602, and one or more I/O interfaces 603 connected between the processor 601 and the memory 602 Among them; wherein, the memory 602 stores one or more computer programs that can be executed by at least one processor 601, and one or more computer programs are executed by at least one processor 601, so that at least one processor 601 can perform the above-mentioned human detection methods.
本公开实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序在被处理器/处理核执行时实现上述的人体检测方法。计算机可读存储介质可以是易失性或非易失性计算机可读存储介质。An embodiment of the present disclosure also provides a computer-readable storage medium on which a computer program is stored, wherein the computer program implements the above-mentioned human detection method when executed by a processor/processing core. Computer readable storage media may be volatile or nonvolatile computer readable storage media.
本公开实施例还提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述人体检测方法。An embodiment of the present disclosure also provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above human body detection method.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读存储介质上,计算机可读存储介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components cooperate to execute. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit . Such software may be distributed on computer readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读程序指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM)、静态随机存取存储器(SRAM)、闪存或其他存储器技术、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读程序指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。As known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable program instructions, data structures, program modules, or other data. volatile, removable and non-removable media. Computer storage media include, but are not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), static random access memory (SRAM), flash memory or other memory technologies, portable Compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical disk storage, magnetic cartridge, magnetic tape, magnetic disk storage or other magnetic storage device, or any other device that can be used to store desired information and can be accessed by a computer any other medium. In addition, as is well known to those of ordinary skill in the art, communication media typically embodies computer-readable program instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery medium.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处 理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。Computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect). In some embodiments, an electronic circuit, such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA), can be customized by utilizing state information of computer-readable program instructions, which can Various aspects of the present disclosure are implemented by executing computer readable program instructions.
这里所描述的计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。The computer program products described here can be specifically realized by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序 产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则可单独使用与特定实施例相结合描述的特征、特性和/或元素,或可与其他实施例相结合描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本公开的范围的情况下,可进行各种形式和细节上的改变。Example embodiments have been disclosed herein, and while specific terms have been employed, they are used and should be construed in a generic descriptive sense only and not for purposes of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone, or may be described in combination with other embodiments, unless explicitly stated otherwise. Combinations of features and/or elements. Accordingly, it will be understood by those of ordinary skill in the art that various changes in form and details may be made without departing from the scope of the present disclosure as set forth in the appended claims.

Claims (13)

  1. 一种人体检测方法,其特征在于,包括:A human detection method, characterized in that, comprising:
    提取待检测图像的结构特征,所述结构特征用于表征所述待检测图像的结构信息;Extracting structural features of the image to be detected, the structural features are used to characterize the structural information of the image to be detected;
    根据所述结构特征,确定所述待检测图像中的人体区域;Determining the human body region in the image to be detected according to the structural features;
    提取所述人体区域的色彩特征,所述色彩特征用于表征所述人体区域的色彩信息;extracting color features of the human body region, where the color features are used to characterize color information of the human body region;
    根据所述结构特征和所述色彩特征,确定所述待检测图像的人体检测结果,所述人体检测结果包括所述待检测图像的人体框和关键点信息。A human body detection result of the image to be detected is determined according to the structural feature and the color feature, and the human body detection result includes human body frame and key point information of the image to be detected.
  2. 根据权利要求1所述的人体检测方法,其特征在于,所述结构特征包括第一结构特征及第二结构特征;The human body detection method according to claim 1, wherein the structural features include a first structural feature and a second structural feature;
    所述提取待检测图像的结构特征,包括:The structural features of the extracted image to be detected include:
    基于预设的第一卷积核对所述待检测图像进行特征提取,获得所述第一结构特征;performing feature extraction on the image to be detected based on a preset first convolution kernel to obtain the first structural feature;
    基于预设的第二卷积核对所述第一结构特征进行特征提取,获得所述第二结构特征。Feature extraction is performed on the first structural feature based on a preset second convolution kernel to obtain the second structural feature.
  3. 根据权利要求2所述的人体检测方法,其特征在于,所述第一卷积核包括多个卷积核簇,所述卷积核簇包括至少一个卷积核,The human body detection method according to claim 2, wherein the first convolution kernel includes a plurality of convolution kernel clusters, and the convolution kernel cluster includes at least one convolution kernel,
    所述基于预设的第一卷积核对所述待检测图像进行特征提取,获得所述第一结构特征,包括:The feature extraction of the image to be detected based on the preset first convolution kernel to obtain the first structural feature includes:
    通过所述多个卷积核簇,分别对所述待检测图像进行特征提取,得到第三结构特征;performing feature extraction on the image to be detected through the plurality of convolution kernel clusters to obtain a third structural feature;
    将与同一卷积核簇对应的第三结构特征叠加,得到与所述卷积核簇对应的结构特征,其中,所述第一结构特征包括与多个卷积核簇对应的结构特征。The third structural feature corresponding to the same convolution kernel cluster is superimposed to obtain the structural feature corresponding to the convolution kernel cluster, wherein the first structural feature includes structural features corresponding to multiple convolution kernel clusters.
  4. 根据权利要求1所述的人体检测方法,其特征在于,所述根据所述结构特征,确定所述待检测图像中的人体区域,包括:The human body detection method according to claim 1, wherein said determining the human body area in the image to be detected according to the structural features comprises:
    根据预设的结构特征阈值对所述结构特征进行过滤处理,获得过滤结构特征,其中,所述结构特征阈值用于过滤所述结构特征中的背景结构特征;Filtering the structural features according to a preset structural feature threshold to obtain filtered structural features, wherein the structural feature threshold is used to filter background structural features in the structural features;
    将所述过滤结构特征回归到所述待检测图像中,确定所述待检测图像中的所述人体区域。Returning the filtered structural feature to the image to be detected, and determining the human body region in the image to be detected.
  5. 根据权利要求4所述的人体检测方法,其特征在于,所述结构特征阈值包括第一结构特征阈值和第二结构特征阈值,所述过滤结构特征包括第一过滤结构特征和第二过滤结构特征;The human body detection method according to claim 4, wherein the structural feature threshold comprises a first structural feature threshold and a second structural feature threshold, and the filtering structural feature comprises a first filtering structural feature and a second filtering structural feature ;
    所述根据预设的结构特征阈值对所述结构特征进行过滤处理,获得过滤结构特征,包括:The filtering of the structural features according to the preset structural feature threshold to obtain the filtered structural features includes:
    根据所述第一结构特征阈值对所述第一结构特征进行过滤处理,获得第一过滤结构 特征;Filtering the first structural feature according to the first structural feature threshold to obtain the first filtered structural feature;
    根据所述第二结构特征阈值对所述第二结构特征进行过滤处理,获得第二过滤结构特征。Filtering is performed on the second structural feature according to the second structural feature threshold to obtain a second filtered structural feature.
  6. 根据权利要求5所述的人体检测方法,其特征在于,所述人体区域包括第一人体区域和第二人体区域,所述第一人体区域是将所述第一过滤结构特征回归到所述待检测图像中获得的区域,所述第二人体区域是将所述第二过滤结构特征回归到所述待检测图像中获得的区域;The human body detection method according to claim 5, wherein the human body area includes a first human body area and a second human body area, and the first human body area returns the first filtering structure feature to the waiting area. Detecting the region obtained in the image, the second human body region is the region obtained by returning the second filter structure feature to the image to be detected;
    所述提取所述人体区域的色彩特征,包括:The extraction of the color features of the human body region includes:
    提取所述第一人体区域的色彩特征,获得第一色彩特征;extracting color features of the first human body region to obtain a first color feature;
    提取所述第二人体区域的色彩特征,获得第二色彩特征。Extracting color features of the second human body region to obtain second color features.
  7. 根据权利要求6所述的人体检测方法,其特征在于,所述根据所述结构特征和所述色彩特征,确定所述待检测图像的人体检测结果,包括:The human body detection method according to claim 6, wherein the determining the human body detection result of the image to be detected according to the structural features and the color features includes:
    连接所述第一结构特征和所述第一色彩特征,获得第一连接特征;connecting the first structural feature and the first color feature to obtain a first connection feature;
    连接所述第二结构特征和所述第二色彩特征,获得第二连接特征;connecting the second structural feature and the second color feature to obtain a second connection feature;
    基于预设的激活函数对所述第一连接特征和所述第二连接特征进行激活处理,获得所述待检测图像的人体检测结果。Activation processing is performed on the first connection feature and the second connection feature based on a preset activation function to obtain a human body detection result of the image to be detected.
  8. 根据权利要求1-7中任意一项所述的人体检测方法,其特征在于,所述人体检测方法通过预设的人体检测模型实现。The human body detection method according to any one of claims 1-7, characterized in that, the human body detection method is realized by a preset human body detection model.
  9. 根据权利要求8所述的人体检测方法,其特征在于,所述提取待检测图像的结构特征之前,还包括:The human body detection method according to claim 8, wherein, before extracting the structural features of the image to be detected, further comprising:
    通过预设的训练集对所述人体检测模型进行训练,其中,所述训练集中包括样本图像及所述样本图像的人体框标注信息。The human body detection model is trained through a preset training set, wherein the training set includes sample images and human body frame annotation information of the sample images.
  10. 一种人体检测装置,其特征在于,包括:A human detection device, characterized in that it comprises:
    第一提取模块,被配置为提取待检测图像的结构特征,所述结构特征用于表征所述待检测图像的结构信息;The first extraction module is configured to extract structural features of the image to be detected, and the structural features are used to characterize the structural information of the image to be detected;
    区域确定模块,被配置为根据所述结构特征,确定所述待检测图像中的人体区域;an area determination module configured to determine the human body area in the image to be detected according to the structural features;
    第二提取模块,被配置为提取所述人体区域的色彩特征,所述色彩特征用于表征所述人体区域的色彩信息;The second extraction module is configured to extract color features of the human body region, where the color features are used to represent color information of the human body region;
    检测模块,被配置为根据所述结构特征和所述色彩特征,确定所述待检测图像的人体检测结果,所述人体检测结果包括所述待检测图像的人体框和关键点信息。The detection module is configured to determine the human body detection result of the image to be detected according to the structural feature and the color feature, and the human body detection result includes human body frame and key point information of the image to be detected.
  11. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1-9中任一所述的人体检测方法。An electronic device, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, characterized in that, when the processor executes the computer program, any one of claims 1-9 is implemented. The human detection method.
  12. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1-9中任一所述的人体检测方法。A computer-readable storage medium, on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the human body detection method according to any one of claims 1-9 is implemented.
  13. 一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,其中,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行用于实现权利要求1-9中的任一项所述的人体检测方法。A computer program product, comprising computer readable codes, or a non-volatile computer readable storage medium bearing computer readable codes, wherein when the computer readable codes are run in a processor of an electronic device, the The processor in the electronic device is used to implement the human body detection method described in any one of claims 1-9.
PCT/CN2022/111687 2021-11-05 2022-08-11 Human body detection method and apparatus, electronic device, and computer-readable storage medium WO2023077897A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111303347.XA CN113762221B (en) 2021-11-05 2021-11-05 Human body detection method and device
CN202111303347.X 2021-11-05

Publications (1)

Publication Number Publication Date
WO2023077897A1 true WO2023077897A1 (en) 2023-05-11

Family

ID=78784601

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/111687 WO2023077897A1 (en) 2021-11-05 2022-08-11 Human body detection method and apparatus, electronic device, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN113762221B (en)
WO (1) WO2023077897A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762221B (en) * 2021-11-05 2022-03-25 通号通信信息集团有限公司 Human body detection method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120274755A1 (en) * 2011-04-29 2012-11-01 Tata Consultancy Services Limited System and method for human detection and counting using background modeling, hog and haar features
CN108038469A (en) * 2017-12-27 2018-05-15 百度在线网络技术(北京)有限公司 Method and apparatus for detecting human body
CN110188776A (en) * 2019-05-30 2019-08-30 京东方科技集团股份有限公司 Image processing method and device, the training method of neural network, storage medium
CN112001251A (en) * 2020-07-22 2020-11-27 山东大学 Pedestrian re-identification method and system based on combination of human body analysis and clothing color
CN113762221A (en) * 2021-11-05 2021-12-07 通号通信信息集团有限公司 Human body detection method and device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6664163B2 (en) * 2015-08-05 2020-03-13 キヤノン株式会社 Image identification method, image identification device, and program
CN108229418B (en) * 2018-01-19 2021-04-02 北京市商汤科技开发有限公司 Human body key point detection method and apparatus, electronic device, storage medium, and program
CN110298212B (en) * 2018-03-21 2023-04-07 腾讯科技(深圳)有限公司 Model training method, emotion recognition method, expression display method and related equipment
CN109214346B (en) * 2018-09-18 2022-03-29 中山大学 Picture human body action recognition method based on hierarchical information transmission
CN110163127A (en) * 2019-05-07 2019-08-23 国网江西省电力有限公司检修分公司 A kind of video object Activity recognition method from thick to thin
CN111680781B (en) * 2020-04-20 2023-07-25 北京迈格威科技有限公司 Neural network processing method and device, electronic equipment and storage medium
CN112001229B (en) * 2020-07-09 2021-07-20 浙江大华技术股份有限公司 Method, device and system for identifying video behaviors and computer equipment
CN112883880B (en) * 2021-02-25 2022-08-19 电子科技大学 Pedestrian attribute identification method based on human body structure multi-scale segmentation, storage medium and terminal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120274755A1 (en) * 2011-04-29 2012-11-01 Tata Consultancy Services Limited System and method for human detection and counting using background modeling, hog and haar features
CN108038469A (en) * 2017-12-27 2018-05-15 百度在线网络技术(北京)有限公司 Method and apparatus for detecting human body
CN110188776A (en) * 2019-05-30 2019-08-30 京东方科技集团股份有限公司 Image processing method and device, the training method of neural network, storage medium
CN112001251A (en) * 2020-07-22 2020-11-27 山东大学 Pedestrian re-identification method and system based on combination of human body analysis and clothing color
CN113762221A (en) * 2021-11-05 2021-12-07 通号通信信息集团有限公司 Human body detection method and device

Also Published As

Publication number Publication date
CN113762221B (en) 2022-03-25
CN113762221A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
Zhang et al. Empowering things with intelligence: a survey of the progress, challenges, and opportunities in artificial intelligence of things
CN109558832B (en) Human body posture detection method, device, equipment and storage medium
WO2018188453A1 (en) Method for determining human face area, storage medium, and computer device
US20200410669A1 (en) Animal Detection Based on Detection and Association of Parts
Kumarapu et al. Animepose: Multi-person 3d pose estimation and animation
CN111259751B (en) Human behavior recognition method, device, equipment and storage medium based on video
CN110135249B (en) Human behavior identification method based on time attention mechanism and LSTM (least Square TM)
WO2023010758A1 (en) Action detection method and apparatus, and terminal device and storage medium
Mocanu et al. Deep-see face: A mobile face recognition system dedicated to visually impaired people
CN113128368B (en) Method, device and system for detecting character interaction relationship
Ghadi et al. Syntactic model-based human body 3D reconstruction and event classification via association based features mining and deep learning
Couprie et al. Convolutional nets and watershed cuts for real-time semantic labeling of rgbd videos
US11551407B1 (en) System and method to convert two-dimensional video into three-dimensional extended reality content
WO2022156317A1 (en) Video frame processing method and apparatus, electronic device, and storage medium
Kadkhodamohammadi et al. Articulated clinician detection using 3D pictorial structures on RGB-D data
CN111680670B (en) Cross-mode human head detection method and device
Raheja et al. Hand gesture pointing location detection
WO2023077897A1 (en) Human body detection method and apparatus, electronic device, and computer-readable storage medium
Gite et al. Early anticipation of driver’s maneuver in semiautonomous vehicles using deep learning
Javed et al. Face mask detection and social distance monitoring system for COVID-19 pandemic
Liu et al. Dilated high-resolution network driven RGB-T multi-modal crowd counting
US20240005464A1 (en) Reflection removal from an image
Kareem et al. Using skeleton based optimized residual neural network architecture of deep learning for human fall detection
Adewopo et al. Baby physical safety monitoring in smart home using action recognition system
Chen et al. Touch event recognition for human interaction