CN111523494A - Human body image detection method - Google Patents

Human body image detection method Download PDF

Info

Publication number
CN111523494A
CN111523494A CN202010341723.3A CN202010341723A CN111523494A CN 111523494 A CN111523494 A CN 111523494A CN 202010341723 A CN202010341723 A CN 202010341723A CN 111523494 A CN111523494 A CN 111523494A
Authority
CN
China
Prior art keywords
human body
image
detection
loss
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010341723.3A
Other languages
Chinese (zh)
Inventor
侯峦轩
马鑫
赫然
孙哲南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Zhongke Intelligent Identification Industry Technology Research Institute Co ltd
Original Assignee
Tianjin Zhongke Intelligent Identification Industry Technology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Zhongke Intelligent Identification Industry Technology Research Institute Co ltd filed Critical Tianjin Zhongke Intelligent Identification Industry Technology Research Institute Co ltd
Priority to CN202010341723.3A priority Critical patent/CN111523494A/en
Publication of CN111523494A publication Critical patent/CN111523494A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Abstract

The invention discloses a human body image detection method, which comprises the following steps: preprocessing an input training image, utilizing automatic data augmentation and data expansion, and using a pyramid-like network based on a characteristic region of a hollow convolution bottleck; cutting a boundary frame formed by the detected human body, and only keeping an image in the frame; and inputting the cut image into a designed model for training to obtain a pedestrian detection model. The invention can carry out two-dimensional space detection on the input image containing the human body, and the generated image has accurate human body space information and smaller calculation cost.

Description

Human body image detection method
Technical Field
The invention relates to the technical field of image processing, in particular to a human body image detection method.
Background
Human body image detection refers to marking the spatial geometric position of a human body from an image containing the human body. Human detection is a computer technology related to computer vision and image processing for detecting semantic objects of a particular class in digital images and videos. The object detection fields for intensive human research include face detection and pedestrian detection. Human detection has applications in many areas of computer vision, including image retrieval and video surveillance.
Due to the flexibility of human body, there are various postures and shapes, and the appearance is greatly influenced by wearing, posture, visual angle, etc., and also faces the influence of factors such as shading and lighting, etc., which makes the pedestrian detection a very challenging subject in the field of computer vision. The main difficult problems to be solved in pedestrian detection are:
the first is that: the appearance difference is large. Including viewing angle, pose, apparel and attachments, lighting, imaging distance, etc. The appearance of pedestrians is very different from different perspectives. Pedestrians in different postures also have great appearance difference. The appearance of the clothes is very different due to different clothes worn by people and the influence of attachments such as umbrella opening, hat wearing, scarf wearing, luggage carrying and the like. Differences in illumination also cause some difficulties. The human body at a long distance and the human body at a short distance are also very different in appearance.
Secondly, the following steps: the problem of occlusion. In many application scenes, pedestrians are very dense, severe occlusion exists, and only a part of a human body can be seen, which brings serious challenges to a detection algorithm.
Thirdly, the method comprises the following steps: the background is complex. Whether indoor or outdoor, the background that pedestrian detection generally faces is very complicated, and the outward appearance and shape, color, texture of some objects are very similar to the human body, leads to the unable accurate differentiation of algorithm.
Fourthly, the method comprises the following steps: and detecting the speed. The pedestrian detection generally adopts a complex model, has quite large calculation amount, is very difficult to achieve real time, and generally needs a large amount of optimization.
The idea of the background modeling algorithms is to obtain a background model through previous frame learning, and then compare a current frame with a background frame to obtain a moving target, i.e. a changed region in an image. The background modeling algorithm is simple to implement and fast, but has the following problems: only moving objects can be detected, and for stationary objects, processing cannot be performed. The influence of illumination change and shadow is great. If the color of the target is very close to the background, missing inspection and breakage can occur. Is easily affected by bad weather such as rain and snow, and disturbance objects such as leaf shaking. If multiple objects are stuck and overlapped, the objects cannot be processed. The reason for this is because these background modeling algorithms only utilize information at the pixel level and do not utilize semantic information at higher levels in the image.
The invention also discloses a method based on machine learning. The method based on machine learning is the mainstream of pedestrian detection algorithm at the present stage, and mainly adopts the scheme of artificial features and a classifier. The human body has own appearance characteristics, and the characteristics can be designed manually and then used for training a classifier to distinguish pedestrians from backgrounds. The features comprise common features in machine learning such as color, edge, texture and the like, and the adopted classifier comprises algorithms commonly used in the computer vision field such as neural network, SVM, AdaBoost, random forest and the like. Since it is a detection problem, a sliding window technique is generally used.
Due to further research in technology and high quality and high accuracy of images with human body bounding boxes, the images have important significance for user experience and market competition. The quality of the existing human body image boundary box generation cannot meet the requirement, and the uncertainty is large. Therefore, it is necessary to improve the human body detection method by one step.
Disclosure of Invention
The invention aims to provide a human body image detection method, which solves the problems of running speed and precision in the existing detection method so as to improve the generation quality of a human body image boundary frame and reduce uncertainty.
In order to achieve the purpose of the invention, the invention provides a human body image detection method, which comprises the following steps:
s1, preprocessing image data in an image database: carrying out automatic data enhancement on an original image, and carrying out specific automatic data enhancement operation by taking the probability P and the operation amplitude M of data enhancement operation and implementation as a triple;
s2, sending the original image into a feature pyramid network based on cavity convolution for detection, and only outputting a human body image marked by a boundary frame by a human body; obtaining a human body image depth neural network model marked by a bounding box through training; using the human body image which is subjected to data amplification and cutting in the step S1 as the input of the network, using json files marked out in an xy-axis coordinate form in a training set as the marking information image of the human body boundary frame as the GroudTruth, training the detection network in the deep neural network model, and obtaining the trained detection neural network model which finishes the detection from the human body image to the human body image with the boundary frame;
and S3, carrying out posture estimation processing on the images containing the human body in the test data set by using the trained deep neural network model.
Further, the enhancement process includes random flipping, random rotation, and random scaling and employs specific parameters.
Furthermore, the feature pyramid network FPN adopts a specific data enhancement method to process the picture, modifies the last two stages of the FPN to specifically aim at the target detection, cuts the detected human body image and inputs the cut human body image,
the method specifically comprises the following steps:
adopting Resnet50 as a backbone network to extract features, and randomly initializing a ResNet50 network by using standard Gaussian distribution;
according to the features extracted by Resnet50, a feature map with the scale of 1-4, 4 is reserved and named as P2,P3,P4,P5And stage5 is added by concatenating convolution kernels of convolution kernel size 1 x 1, the feature map being P6A characteristic diagram of (1); and after stage4 the spatial resolution of the feature map is kept constant, i.e. the spatial resolution of the feature map is kept constant
Figure BDA0002468712330000041
Wherein
Figure BDA0002468712330000045
Representing the spatial resolution, i is the original map size, x ∈ [ i,2,3,4,5,6]At P4,P5,P6Connecting convolution kernels with the convolution kernel size of 1 x 1 to keep the number of channels consistent;
and finally, summing the feature maps of the stages 4-6 according to a pyramid framework to form an FPN feature pyramid, performing target detection by adopting a Fast RCNN method, and performing constraint through regression loss and classification loss. The classification loss and the regression loss are fused, the classification loss adopts log loss, the loss of regression is the same as R-CNN, and the total loss function is as follows:
Figure BDA0002468712330000042
two branches are connected to the last full connection layer of the detection network, one branch is softmax and is used for classifying each ROI area, if K types are to be classified, the output result is p (p)0.........pk) The other is a bounding box for more precise regions of the ROI, output as
Figure BDA0002468712330000043
Coordinates of a bounding box representing a class k, wherein (x, y) is coordinates of the upper left corner of the bounding box, (x + w, y + h) is coordinates of the lower right corner of the bounding box, u is a group Truth of each POI area, v is a regression target of the group Truth of the bounding box, wherein lambda is a super parameter, and balance between two task losses is controlled, wherein lambda is 1, and [ u is more than or equal to 1 ≧ 1]Is 1 when u is more than or equal to 1,
the classification loss is specifically:
Figure BDA0002468712330000044
a loss function in the form of a log;
the regression loss is specifically:
Figure BDA0002468712330000051
wherein v ═ vx,vy,vw,vhIs the position of the real box of class u, and
Figure BDA0002468712330000052
is a predicted box position of class u, and
Figure BDA0002468712330000053
compared with the prior art, the human body detection network has the advantages that the problem of contradiction between operation performance and detection performance existing in detection is solved by the human body detection network according to the properties, the detection performance is improved by keeping the spatial resolution of the characteristic diagram and expanding the receptive field by using the cavity convolution, and the human body detection image with a very good perception effect can be generated by combining the human body image boundary frame detection model with the cavity convolution. In addition, because the existing detection method based on deep learning generally generalizes a classification network to a human body detection task by adding a convolution layer, most of the pre-training models are obtained based on the classification network at present and are not beneficial to directly generalizing to the human body detection model, by means of the proposed human body image detection model of the deep neural network fusing the cavity convolution, a residual error network is used as the basis for constructing the model, and a pyramid structure, particularly a related bounding lattice, is combined, so that the perception field of the model is larger, the effect is better, and the generalization capability is stronger.
Drawings
FIG. 1 is a process flow diagram of the method of the present invention;
FIG. 2 is a block diagram of a human body detection network according to the present invention;
FIG. 3 illustrates the connection of operations between p _4, p _5, and p _6 according to the present invention;
FIG. 4 is a block diagram of a different type of bottleeck of the present invention 3;
FIG. 5 is a schematic diagram of the detection of network summing operation of the present invention;
FIG. 6 is a sample visualization result of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
As shown in figures 1-6 of the drawings,
the human body image detection method comprises the following steps:
in step S1, specific data enhancement is first performed on the image training set data, and first we define all possible data enhancements that can be applied to the image, as shown in the following table (the parameters all correspond to the parameters of the TensorFlow corresponding function):
Figure BDA0002468712330000061
the following specific operations were employed:
Figure BDA0002468712330000071
an enhancement policy is defined as an unordered set of K sub-policies. During training, one of the K sub-strategies will be randomly selected and then applied to the current image. Each sub-strategy has 2 image enhancement operations, where P is the probability value (between the range 0-1) for each operation, M is the parameter magnitude, and each parameter magnitude is normalized to be within the interval 0-10.
And step S2, training a human body image detection model of the neural network fused with the hole convolution bottleeck by using the training input data so as to complete the detection task of the human body image.
In step S2, the clipped image containing the human body and the label information in step S1 are mainly used as the input of the network, and the bounding-box containing the labeled human body (in the form of json file, rectangular bounding boxes are respectively marked in the form of xy-axis coordinates) is used as the group route, so as to train the human body detection network in the depth model, and complete the task from the human body input of the image to the output of the image with the bounding box. Specifically, after the human body image detected by the detection network is cut, the characteristic diagram is extracted by using ResNet50 as a backbone network,
and S3, performing target detection on the images in the training data set by using human body detection, only reserving boundary frames of human bodies for all class frames, performing cutting operation to generate human body images with corresponding sizes of 224 x 224, marking information json files by using the human body boundary frames in the data set, and calling COCO api as marking information of the corresponding human bodies to accelerate the reading speed of I/O.
The human body detection network trains and uses all 80 classes of the COCO data set, and finally the human body class is selected for output (the output image is in a mode that the human body is marked by a bounding box in the image). The specific structure is shown in fig. 2, wherein the specific design of the human body detection network and the modules in the figure are explained as follows:
adopting Resnet50 as a backbone network to extract features, and randomly initializing a ResNet50 network by using standard Gaussian distribution;
according to the features extracted by Resnet50, a feature map with the scale of 1-4, 4 is reserved and named as P2,P3,P4,P5And adding stage5 by concatenating convolution kernels having convolution kernel size 1 x 1, with the feature map being P6A characteristic diagram of (1);
and after stage4 we keep the spatial resolution of the feature map unchanged, i.e. we keep the spatial resolution of the feature map unchanged
Figure BDA0002468712330000081
The conversion is accomplished by 3 x 3 convolutions or pooling layers with step size 2, wherein
Figure BDA0002468712330000082
Representing the spatial resolution, i is the original picture size, where the original picture size is 224 x 224, x ∈ [ i,2,3,4,5,6]At P4,P5,P6And connecting convolution kernels with the convolution kernel size of 1 x 1 to keep the channel number consistent (256 channels).
P4,P5,P6The transformation between the two types of the B and the B is realized by the bottleeck of the two types of the AB, and the design of the bottleeck of the two types of the AB is shown in a figureAnd 4, AB two types of bottleeck are respectively obtained by convolution of 1 × 1, convolution of 2 hole coefficients of 3 × 3 and relu layer.
And finally, summing the feature maps of the stages 4-6 according to a pyramid framework, wherein a lateral connection summing mode is as shown in FIG. 5, forming an FPN feature pyramid, performing target detection by adopting a Fast RCNN method, and performing constraint through regression loss and classification loss. The multiple loss fusion (classification loss and regression loss fusion) is the prediction operation in FIG. 2, the classification loss is log loss (i.e. the probability of real classification is negative log, and the classification output is K +1 dimension), and the loss of regression is the same as that of R-CNN (smooth L1 loss). Overall loss function:
Figure BDA0002468712330000091
two branches are connected to the last full connection layer of the detection network, one branch is softmax and is used for classifying each ROI area, if K types are to be classified (adding K +1 types in total to background), the output result is p ═ p (p is0.........pk) The other is a bounding box for more precise regions of the ROI, output as
Figure BDA0002468712330000092
The coordinates of the bounding box representing the k classes are (x, y) the coordinates of the upper left corner of the bounding box and (x + w, y + h) the coordinates of the lower right corner of the bounding box. u is the group Truth of each POI area, and v is the regression target of the group Truth of the bounding box. Where λ is the hyperparameter, controls the balance between the two task losses, where λ is 1. [ u.gtoreq.1]Is 1 when u is more than or equal to 1.
The classification loss is specifically:
Figure BDA0002468712330000093
is a loss function in log form.
The regression loss is specifically:
Figure BDA0002468712330000094
whereinv=vx,vy,vw,vhIs the position of the real box of class u, and
Figure BDA0002468712330000095
is the prediction box position of class u. And is
Figure BDA0002468712330000096
In addition, the cropping operation is to perform operations such as expanding the frame to a fixed aspect ratio, then performing cropping, and then performing data enhancement on the bounding box region in the image containing the human body bounding box, such as random flipping, random rotation, random scaling, and the like.
Further, in all training steps, the data set is the MSCOCO training data set (including 57K images and 150K images containing human body instances), and after detection by the detector network (FPN + roiign) in step S2, for all detected bounding boxes, only the human body bounding box is used (i.e. the bounding box of the human class in the first 100 boxes of all classes is used in all experiments), and the human body bounding box is expanded to the fixed aspect ratio light: weight: 384:288, the cropped image is correspondingly resized to the default height 384 pixels and width 288 pixels, and then the corresponding data enhancement policy is applied, for the cropped image, the random rotation (angle-45 ° +45 °) and the random scale (0.7 ° -1.35) are applied, and the annotation information (json file containing the human body bounding box position) of the corresponding picture is used as groudtruth.
In order to describe the specific implementation mode of the invention in detail and verify the effectiveness of the invention, the method provided by the invention is applied to an open data set training. The database contains photos of some natural scenes, such as animals, animated characters, etc. (which have been used as interference factors to improve the robustness of the model and the application capability of the actual natural scene). And selecting all images of the data set as a training data set, automatically amplifying image data, performing target detection on all images in the training data set by using a trained characteristic pyramid network (FPN), outputting only a human body class boundary box, generating corresponding cut human body images, training a global network and a correction network by using gradient back propagation until the network is converged, and obtaining a human body detection model.
To test the validity of the model, the input image is processed and the visualization is shown in fig. 6. In the experiment, the result of the experiment is shown in fig. 1 by comparing with the real image of groudtruth. The embodiment effectively proves the effectiveness of the method provided by the invention on the super-resolution of the image.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (3)

1. A human body image detection method is characterized by comprising the following steps:
s1, preprocessing image data in an image database: carrying out automatic data enhancement on an original image, and carrying out specific automatic data enhancement operation by taking the probability P and the operation amplitude M of data enhancement operation and implementation as a triple;
s2, sending the original image into a feature pyramid network based on cavity convolution for detection, and only outputting a human body image marked by a boundary frame by a human body; obtaining a human body image depth neural network model marked by a bounding box through training; using the human body image which is subjected to data amplification and cutting in the step S1 as the input of the network, using json files marked out in an xy-axis coordinate form in a training set as the marking information image of the human body boundary frame as the GroudTruth, training the detection network in the deep neural network model, and obtaining the trained detection neural network model which finishes the detection from the human body image to the human body image with the boundary frame;
and S3, carrying out posture estimation processing on the images containing the human body in the test data set by using the trained deep neural network model.
2. The human image detecting method according to claim 1, wherein the enhancement process includes random flipping, random rotation, and random scaling and employs specific parameters.
3. The human image detection method of claim 1, wherein the feature pyramid network FPN processes the picture by a specific data enhancement method, and modifies the last two stages of the FPN to be specific to the target detection, and cuts the detected human image for input,
the method specifically comprises the following steps:
adopting Resnet50 as a backbone network to extract features, and randomly initializing a ResNet50 network by using standard Gaussian distribution;
according to the features extracted by Resnet50, a feature map with the scale of 1-4, 4 is reserved and named as P2,P3,P4,P5And stage5 is added by concatenating convolution kernels of convolution kernel size 1 x 1, the feature map being P6A characteristic diagram of (1); and after stage4 the spatial resolution of the feature map is kept constant, i.e. the spatial resolution of the feature map is kept constant
Figure FDA0002468712320000021
Wherein
Figure FDA0002468712320000022
Representing the spatial resolution, i is the original map size, x ∈ [ i,2,3,4,5,6]At P4,P5,P6Connecting convolution kernels with the convolution kernel size of 1 x 1 to keep the number of channels consistent;
and finally, summing the feature maps of the stages 4-6 according to a pyramid framework to form an FPN feature pyramid, performing target detection by adopting a Fast RCNN method, and performing constraint through regression loss and classification loss. The classification loss and the regression loss are fused, the classification loss adopts log loss, the loss of regression is the same as R-CNN, and the total loss function is as follows:
Figure FDA0002468712320000023
two branches are connected to the last full connection layer of the detection network, one branch is softmax and is used for classifying each ROI area, if K types are to be classified, the output result is p (p)0………pk) The other is a bounding box for more precise regions of the ROI, output as
Figure FDA0002468712320000024
Coordinates of a bounding box representing a class k, wherein (x, y) is coordinates of the upper left corner of the bounding box, (x + w, y + h) is coordinates of the lower right corner of the bounding box, u is a group Truth of each POI area, v is a regression target of the group Truth of the bounding box, wherein lambda is a super parameter, and balance between two task losses is controlled, wherein lambda is 1, and [ u is more than or equal to 1 ≧ 1]Is 1 when u is more than or equal to 1,
the classification loss is specifically:
Figure FDA0002468712320000025
a loss function in the form of a log;
the regression loss is specifically:
Figure FDA0002468712320000031
wherein v ═ vx,vy,vw,vhIs the position of the real box of class u, and
Figure FDA0002468712320000032
is a predicted box position of class u, and
Figure FDA0002468712320000033
CN202010341723.3A 2020-04-27 2020-04-27 Human body image detection method Pending CN111523494A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010341723.3A CN111523494A (en) 2020-04-27 2020-04-27 Human body image detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010341723.3A CN111523494A (en) 2020-04-27 2020-04-27 Human body image detection method

Publications (1)

Publication Number Publication Date
CN111523494A true CN111523494A (en) 2020-08-11

Family

ID=71903067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010341723.3A Pending CN111523494A (en) 2020-04-27 2020-04-27 Human body image detection method

Country Status (1)

Country Link
CN (1) CN111523494A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580778A (en) * 2020-11-25 2021-03-30 江苏集萃未来城市应用技术研究所有限公司 Job worker mobile phone use detection method based on YOLOv5 and Pose-animation
CN112686282A (en) * 2020-12-11 2021-04-20 天津中科智能识别产业技术研究院有限公司 Target detection method based on self-learning data
CN114693935A (en) * 2022-04-15 2022-07-01 湖南大学 Medical image segmentation method based on automatic data augmentation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124415A1 (en) * 2015-11-04 2017-05-04 Nec Laboratories America, Inc. Subcategory-aware convolutional neural networks for object detection
CN108038409A (en) * 2017-10-27 2018-05-15 江西高创保安服务技术有限公司 A kind of pedestrian detection method
CN109063559A (en) * 2018-06-28 2018-12-21 东南大学 A kind of pedestrian detection method returned based on improvement region
CN110321923A (en) * 2019-05-10 2019-10-11 上海大学 Object detection method, system and the medium of different scale receptive field Feature-level fusion
CN110443144A (en) * 2019-07-09 2019-11-12 天津中科智能识别产业技术研究院有限公司 A kind of human body image key point Attitude estimation method
CN111160085A (en) * 2019-11-19 2020-05-15 天津中科智能识别产业技术研究院有限公司 Human body image key point posture estimation method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124415A1 (en) * 2015-11-04 2017-05-04 Nec Laboratories America, Inc. Subcategory-aware convolutional neural networks for object detection
CN108038409A (en) * 2017-10-27 2018-05-15 江西高创保安服务技术有限公司 A kind of pedestrian detection method
CN109063559A (en) * 2018-06-28 2018-12-21 东南大学 A kind of pedestrian detection method returned based on improvement region
CN110321923A (en) * 2019-05-10 2019-10-11 上海大学 Object detection method, system and the medium of different scale receptive field Feature-level fusion
CN110443144A (en) * 2019-07-09 2019-11-12 天津中科智能识别产业技术研究院有限公司 A kind of human body image key point Attitude estimation method
CN111160085A (en) * 2019-11-19 2020-05-15 天津中科智能识别产业技术研究院有限公司 Human body image key point posture estimation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
R.GIRSHICK: "《Fast R-CNN》", 《2015 IEE INTERNATIONAL CONFERENCE ON COMPUTER VISION(ICCV)》 *
ZEMING LI ET AL.: "《DetNet: A Backbone network for Object Detection》", 《ARXIV:1804.06215V2 [CS.CV]》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580778A (en) * 2020-11-25 2021-03-30 江苏集萃未来城市应用技术研究所有限公司 Job worker mobile phone use detection method based on YOLOv5 and Pose-animation
CN112686282A (en) * 2020-12-11 2021-04-20 天津中科智能识别产业技术研究院有限公司 Target detection method based on self-learning data
CN114693935A (en) * 2022-04-15 2022-07-01 湖南大学 Medical image segmentation method based on automatic data augmentation

Similar Documents

Publication Publication Date Title
CN108446617B (en) Side face interference resistant rapid human face detection method
JP4898800B2 (en) Image segmentation
CN111523494A (en) Human body image detection method
Geng et al. Using deep learning in infrared images to enable human gesture recognition for autonomous vehicles
CN112560675B (en) Bird visual target detection method combining YOLO and rotation-fusion strategy
CN113408584B (en) RGB-D multi-modal feature fusion 3D target detection method
Nedović et al. Stages as models of scene geometry
JP2002203239A (en) Image processing method for detecting human figure in digital image
CN113408594B (en) Remote sensing scene classification method based on attention network scale feature fusion
CN110110755B (en) Pedestrian re-identification detection method and device based on PTGAN region difference and multiple branches
CN112686928B (en) Moving target visual tracking method based on multi-source information fusion
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN110909724B (en) Thumbnail generation method of multi-target image
Wang et al. Mask-RCNN based people detection using a top-view fisheye camera
CN114565675A (en) Method for removing dynamic feature points at front end of visual SLAM
Al-Heety Moving vehicle detection from video sequences for traffic surveillance system
CN114782417A (en) Real-time detection method for digital twin characteristics of fan based on edge enhanced image segmentation
CN115019274A (en) Pavement disease identification method integrating tracking and retrieval algorithm
CN109448093B (en) Method and device for generating style image
Shuai et al. An improved YOLOv5-based method for multi-species tea shoot detection and picking point location in complex backgrounds
Ilehag et al. Classification and representation of commonly used roofing material using multisensorial aerial data
CN117079125A (en) Kiwi fruit pollination flower identification method based on improved YOLOv5
Li et al. Multi-class weather classification based on multi-feature weighted fusion method
CN115713546A (en) Lightweight target tracking algorithm for mobile terminal equipment
Bertozzi et al. Multi stereo-based pedestrian detection by daylight and far-infrared cameras

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200811

RJ01 Rejection of invention patent application after publication