AU2019101224A4 - Method of Human detection research and implement based on deep learning - Google Patents

Method of Human detection research and implement based on deep learning Download PDF

Info

Publication number
AU2019101224A4
AU2019101224A4 AU2019101224A AU2019101224A AU2019101224A4 AU 2019101224 A4 AU2019101224 A4 AU 2019101224A4 AU 2019101224 A AU2019101224 A AU 2019101224A AU 2019101224 A AU2019101224 A AU 2019101224A AU 2019101224 A4 AU2019101224 A4 AU 2019101224A4
Authority
AU
Australia
Prior art keywords
convolutional
residual
data
training
photographs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2019101224A
Inventor
Zikai Shu
Zeyuan Wu
Tianyu Xin
Fengkun YANG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yang Fengkun Miss
Original Assignee
Yang Fengkun Miss
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yang Fengkun Miss filed Critical Yang Fengkun Miss
Priority to AU2019101224A priority Critical patent/AU2019101224A4/en
Application granted granted Critical
Publication of AU2019101224A4 publication Critical patent/AU2019101224A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)

Abstract

This image invention lies on the field of deep learning and also improved by YoloV3. We can identify detect the pedestrian from the pictures through the follow steps: first, we acquire generous photographs from some websites, so that we have enough pictures to make our invention get plenty of learning opportunities to get a superior data. Then, we classify our photographs for training and testing after our selecting and pre-processing, using the photographs of training to make our invention have a practice of distinguish those pedestrians. Then, we download the YoloV3 from github, try to use it to train our data. By adjusting the parameter in our algorithm and also improved some structure of it, the accuracy of our invention in human detection can be improved a lot. Finally, we just put the test data in our invention, which can distinguish detect the photographs if there are pedestrians without human involvement. Mi111~ .20 20004 00100013D 04 32 111e0420 Con02D Bc L 1248 Co 03 CS02.D R Wida Slc2 20 Mc W 3"e 52 .2.2.(ntft 5245 75in Figure 1 Type Fillers Size Output Convolutional 32 3x3 256 x 256 Convolutional 64 3x3/ 2 128Nx 128 Convolutional Convolutional 32 1 2 1 xConvolutional 64 3 x C M I.kRe Residual 128 x 128 Layer Layer Layer Convolutional 128 3x 3/2 64 x 64 Convolutional 64 1 x 1 2xConvolutional 128 3 x 3 Residual 64 x 64 Convolutional 256 3x3 /2 32 x32 Convolutional 128 x 1 8x Convolutional 256 3x3 Residual Residual 32 x 32 Convolutional 512 3x3 / 2 16 x16 A dd Convolutional 256 1x1 8x Convolutional 512 3>x3 Residual 16 x 16 Convolutional Convolutional Convolutional 1024 3x3/2 8x8 -(xi) (3x3) Convolutional 512 1 x 1 4x Convolutional 1024 3x3 Residual 8x8 Avgpool Global Connected 1000 Softmax Figure 2

Description

TITLE
Method of Human detection research and implement based on deep learning
FIELD OF THE INVENTION
This image invention lies on the field of deep learning and also yolov3
BACKGROUND
Recently,with the rapid development of artificial intelligence, the requirements of machine automation have gradually improved, human detection have entered our daily life, our smart-phones recognition our face and unlock the phones. Even in the public security system, they also use it for distinguish the guilty. With the widely applied, the accuracy of face recognition also need be augmented. It is important in not only efficiency of operations but also make our safe have a superior promise. Traditional face recognition have a larger error in accuracy rate, improved by Yolo_V3, we control the parameter of artificial neural networks, which can increase their accuracy without human involvement. This will be conducive to its application in future life. Face authenticate can replace identity authentication by face recognition,which can play a greater role in many places, such as airports. It reduces the workload of staff, facilitates management, and improves the efficiency of access. Therefore, this face recognition technology will undoubtedly have more extensive
2019101224 05 Oct 2019 use space in the future.
Our invention is based on the Yolo_V3 algorithm to improve the deep learning convolution neural network. Our invention improves the performance to a certain extent and improves the accuracy by continuously optimizing the values of each parameter in the training process.
SUMMARY
In order to improve the accuracy and efficiency of image recognition, and change the error of existing neural network algorithm to some extent, we use modified Yolo_V3 algorithm to improve human detection. By adjusting the parameters, the accuracy of image recognition can be improved. Lift, so that it can make more accurate classification. Fully exploiting the advantages of automatic feature extraction in deep learning, we can judge whether there is a face in the graph and extract it from the face features, which will make it have a wide application prospect in deep learning. In order to build the image database, we will mark and convert the images from the Internet, and divide the images into training sets and test sets. Put it into the Yolo_V3 convolution neural network as shown in the figure, and we can see the network structure of Yolo_V3 intuitively by changing the graph. Among them, the processing steps of Yolo_V3 can be seen as consisting of three basic components through a series of combinations. First, CBL, which is composed of convolution, BN and
2019101224 05 Oct 2019
Leaky relu, is the smallest component of Yolo_V3. resN, N stands for numbers. This component draws on residual structure of ResNet to make the network deeper. Its basic component is also CBL. Concat, a tensor splicing component, splices the upper samples of the middle layer and some of the later layers. This component expands the dimensions of tensors rather than adding them directly. In the whole structure of Yolo_V3, there is no pooling layer and full connection layer. In the process of forward propagation, the size transformation of tensor is realized by changing the step size of convolution sum. In the whole network structure, every time the size transformation of tensor is made, the side length will be reduced by 1/2. We need to advance in the whole network. Five rows of this process, because of this, the network structure will reduce the image to 1/32 of the input, so in order to ensure the output of the feature map, we need to adjust the edge length of the image to a multiple of 32, usually 416*416. In order to improve the efficiency of training, we put data sets into the network in batches to reduce the loss function. Yolo_V3 is a multi-scale training network structure, we can choose between speed and accuracy according to our needs, which is also a manifestation of the flexibility of Yolo_V3. In order to improve its accuracy, we will make certain adjustments to its selection, by adjusting its input and parameters, so that it can discard some of us. The unnecessary samples are recognized and the results are given according to
2019101224 05 Oct 2019 our expectations.
DESCRIPTION OF DRAWING
Figure 1 shows the structure of Yolo_v3 and it’s basic components Figure 2 shows changing the step size of convolution sum Figure 3 shows prediction of Target Boundary Frame.
DESCRIPTION OF PREFERRED EMBODIMENT
Network Design
Firstly, after convolution calculation, a batch of standardization is made in BN layer to determine the training direction of most pictures and discard those data which are far away from the regression line, so that the trained pictures have more uniform features. Then, the data is activated by relu layer, and the deep neural network is divided into several parts. A shallow network subsegment is trained with short cut to control the propagation of gradient and prevent gradient dispersion or even gradient explosion. The other two basic components, Res_unit and Resblock_body, are also composed of several CBLs. In Resblock_body, we abbreviate them as resX according to X res_units. The zero padding step in resX can expand the feature graph which we gradually reduce to zero, and by extending its edge length to zero, we can make a sudden change. Characterization of the image.
Output of YOLO_V3----predictions across scales
What is predictions across scales ?
2019101224 05 Oct 2019
It uses feature pyramid networks for reference, and also multi-scale to detect different size targets. The finer the grid cell, the finer the object can be detected.
The depth of Y 3 is 255, and the rule of edge length is 13:26:52. For COCO categories, there are 80 categories, so each box should output a probability for each category.
Yolo_V3 sets three boxes for each grid cell, so each box needs five basic parameters (x, y, w, h, confidence), and then 80 categories of probability So 3* (5 + 80) = 255. That's how this 255 came about.
Yolo_V3 implements this multi-scale feature map by means of up-sampling. It can be seen from the above structure chart that the two tensors connected by concat in the graph have the same scale (the two joints are 26 x 26 scale and 52 x 52 scale respectively). The tensor scale of concat joints is the same by (2,2) up-sampling.
Each anchor prior (named anchor prior, but not anchor mechanism) consists of two numbers, one representing height and the other representing width.
Yolo_V3 uses logistic regression to predict b-box. This wave operates like a linear regression adjustment b-box in RPN. Each time V3 predicts b-box, the output is (tx, ty, tw, th, to). Then the absolute (x, y, w, h, c) is calculated by formula 1.
Eogistic regression is used to score an objectivity score on the part
2019101224 05 Oct 2019 surrounded by anchor, that is, how likely the location is to be the target. This step is done before prediction, which can remove unnecessary anchor and reduce the amount of calculation.
If the template box is not optimal, even if it exceeds the threshold we set, we will not predict it. Unlike faster R-CNN, Yolo_V3 only operates on one prior, which is the best prior. Logistic regression is used to find the highest objectness score from nine anchor priors. Logistic regression is a linear model of the mapping relationship between prior and objectivity score by using curves.
Prediction of Target Boundary Frame
Yolo_V3 network makes convolution prediction through (4+1+c) K convolution kernels of 11 sizes in three feature graphs, K is the number of bounding box prior (K defaults to 3), C is the number of categories of predicted targets, of which 4 K parameters are responsible for predicting the offset of the target boundary box, and K parameters are responsible for predicting the target boundary. The box contains the probability of the target, and C, K parameters are responsible for predicting the probability that the K preset boundaries correspond to C target categories. The dotted rectangular frame in the figure is the preset boundary box, and the real rectangular frame is the predicted boundary box calculated by the offset predicted by the network. For the center coordinate of the preset boundary box on the feature map, the width and height of the preset
2019101224 05 Oct 2019 boundary box on the feature map, the offset of the center of the boundary box predicted by the network and the ratio of the width to height are calculated respectively. For the final predicted target boundary box, the transformation process from the preset boundary box to the final predicted boundary box is shown in the formula on the right side of the figure. The sigmoid function is designed to reduce the predicted offset to between 0 and 1, so that the central coordinates of the preset boundary frame can be fixed in a cell. The author says that this can accelerate the convergence of the network.
The following figure shows the size of the feature map of three prediction layers and the size of the preset boundary box on each feature map.
Calculation of Loss Function(formula 1)
The loss function of Yolo_V3 is mainly divided into three parts: the loss of target location offset, the loss of target confidence and the loss of target classification, among which the balance coefficient is the loss.
Δ(Ο,ο, C, c,l,g) = + /Lclci(O, C) + g) (1)
Target Confidence Loss(formula 2)
Target confidence can be understood as predicting the probability of the existence of the target in the rectangular box. Binary Cross Entropy is used for the loss of the target confidence, which denotes the existence of the target in the predicted target boundary box i, 0 denotes the
2019101224 05 Oct 2019 non-existence of the target, and 1 denotes the existence of the target. Represents the “Sigmoid” probability of predicting whether there is a target in the rectangular box I of the target.
c) = -Σ G ln(c,) + (1 - o,) ln(l - c,)) =Signioid(ci)
Target Category Loss(formula 3)
The target category loss also uses the binary cross-entropy loss (using the binary cross-entropy loss, which indicates whether the J-type target really exists in the prediction target boundary box i, 0 means nonexistence, 1 means existence). “Sigmoid” Probability Representing the Target Boundary Box I of Network Prediction in Class J Targets N,(O,C) = - Σ Σ ln<C,) + (1-0,) ln(l- ς)) iePn.s Ji=eta ( 3 )
Λ = Sigmoid(CtJ) Target
Location Loss(formula 4)
The square sum of the difference between the real deviation value and the predicted deviation value is used to represent the predicted rectangular frame coordinate offset, the coordinate offset between the matched GTbox and the default box, the predicted target rectangular box parameter, the default rectangular box parameter and the matched real target rectangular box parameter. These parameters are mapped on the
2019101224 05 Oct 2019 prediction feature map.
a„.(U)=L Σ vr-SD iefws
Procedure
Step 1: Data Acquisition
In this project, we used MS COCO, a public data set in the field of target detection. This data set is built by Microsoft, which contains detection, segmentation, key-points and other tasks. Mainly to solve the problem of the following three scene: detecting non - iconic views of objects (often say detection), contextual reasoning between objects and the precise localization of 2 d objects (corresponding to often say the segmentation problem). The average COCO data set contains 3.5 categories and 7.7 instance targets per picture, which is not only a large amount of data, but also a large number of types and instances.
Step2: Date Pre-processing
Then, we pre-process the data set. We make the features of data and
2019101224 05 Oct 2019 also label, then we converted the XML format labels of the data set to TXT labels to accommodate the Yolo_V3 network. This project is pedestrian detection, so we define the category of pictures as human. Finally, we divided the data set into INRIA_train for training data sets and INRIA_test for test data sets for training and testing.
Step3: Training and optimization
In the optimization of the parameter set, we put the data into the network for training in batches, in order to reduce the computation and improve the training effect.
Besides, we use Transfer Learning to improve our structure of Yolo_V3. Transfer learning uses pre-training model as checkpoint to train and generate neural network model to support new tasks. Transfer learning can transfer common characteristic data and information so as to avoid re-learning these knowledge and realize fast learning.
Step4: Testing
We adjust parameters of the network constantly in order to reach the optimal performance. Adjust the class to 1 to sure the recognition rate can as accurate as possibly. Then we put the test set into the network and get the recognition accuracy.
The data and Recognition rate are as follow:
Table l:The data before Transfer Learning:
Class Images Targets Recognition rate
2019101224 05 Oct 2019
all 288 597 0.892
all 288 597 0.909
all 288 597 0.917
Table 2: The data before Transfer Learning:
Class Images Targets Recognition rate
all 288 597 0.925
all 288 597 0.928
all 288 597 0.93
We can observe clearly that the Recognition rate is about 90 percents before transfer learning and after the transfer learning, the rate of recognition can be improved to about 93 percents.This shows that transfer learning is a good method for training convolutional neural networks.

Claims (3)

  1. CLAIM
    1. Method of Human detection research and implement based on deep learning, wherein said method has fully trained the data, repeated random selection training from the original data, and randomly selected the final test results, so the results are true and reliable.
  2. 2. According to method of claim 1, wherein avoid over-fitting, every DBL element in yolov3 network structure has BN layer to regularize the data, which ensures the robustness of the network structure; the accuracy of recognition is also improved by parameter adjustment and transfer learning.
  3. 3. According to method of claim 1, wherein in the process of training, train the repeated data repeatedly to prevent accidental situations, the results should be reliable.
AU2019101224A 2019-10-05 2019-10-05 Method of Human detection research and implement based on deep learning Ceased AU2019101224A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2019101224A AU2019101224A4 (en) 2019-10-05 2019-10-05 Method of Human detection research and implement based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2019101224A AU2019101224A4 (en) 2019-10-05 2019-10-05 Method of Human detection research and implement based on deep learning

Publications (1)

Publication Number Publication Date
AU2019101224A4 true AU2019101224A4 (en) 2020-01-16

Family

ID=69146725

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2019101224A Ceased AU2019101224A4 (en) 2019-10-05 2019-10-05 Method of Human detection research and implement based on deep learning

Country Status (1)

Country Link
AU (1) AU2019101224A4 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633174A (en) * 2020-12-23 2021-04-09 电子科技大学 Improved YOLOv4 high-dome-based fire detection method and storage medium
CN113052184A (en) * 2021-03-12 2021-06-29 电子科技大学 Target detection method based on two-stage local feature alignment
CN113077406A (en) * 2020-11-25 2021-07-06 无锡乐骐科技有限公司 Convolution filling method based on optimization
CN113408321A (en) * 2020-03-16 2021-09-17 中国人民解放军战略支援部队信息工程大学 Real-time target detection method and device for lightweight image and video data
CN113469321A (en) * 2020-03-30 2021-10-01 聚晶半导体股份有限公司 Object detection device and object detection method based on neural network
US11495015B2 (en) 2020-03-30 2022-11-08 Altek Semiconductor Corp. Object detection device and object detection method based on neural network
CN117036234A (en) * 2023-05-09 2023-11-10 中国铁路广州局集团有限公司 Mixed steel rail ultrasonic B-display map damage identification method, system and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408321A (en) * 2020-03-16 2021-09-17 中国人民解放军战略支援部队信息工程大学 Real-time target detection method and device for lightweight image and video data
CN113408321B (en) * 2020-03-16 2023-08-22 中国人民解放军战略支援部队信息工程大学 Real-time target detection method and device for lightweight image and video data
CN113469321A (en) * 2020-03-30 2021-10-01 聚晶半导体股份有限公司 Object detection device and object detection method based on neural network
US11495015B2 (en) 2020-03-30 2022-11-08 Altek Semiconductor Corp. Object detection device and object detection method based on neural network
CN113077406A (en) * 2020-11-25 2021-07-06 无锡乐骐科技有限公司 Convolution filling method based on optimization
CN113077406B (en) * 2020-11-25 2022-06-14 无锡乐骐科技股份有限公司 Image convolution filling method based on optimization
CN112633174A (en) * 2020-12-23 2021-04-09 电子科技大学 Improved YOLOv4 high-dome-based fire detection method and storage medium
CN112633174B (en) * 2020-12-23 2022-08-02 电子科技大学 Improved YOLOv4 high-dome-based fire detection method and storage medium
CN113052184A (en) * 2021-03-12 2021-06-29 电子科技大学 Target detection method based on two-stage local feature alignment
CN117036234A (en) * 2023-05-09 2023-11-10 中国铁路广州局集团有限公司 Mixed steel rail ultrasonic B-display map damage identification method, system and storage medium

Similar Documents

Publication Publication Date Title
AU2019101224A4 (en) Method of Human detection research and implement based on deep learning
US10586103B2 (en) Topographic data machine learning method and system
Nie et al. Pavement distress detection based on transfer learning
NL2025689B1 (en) Crop pest detection method based on f-ssd-iv3
CN108830188A (en) Vehicle checking method based on deep learning
Aditya et al. Batik classification using neural network with gray level co-occurence matrix and statistical color feature extraction
US11468266B2 (en) Target identification in large image data
CN110543906B (en) Automatic skin recognition method based on Mask R-CNN model
CN104462494A (en) Remote sensing image retrieval method and system based on non-supervision characteristic learning
CN110334594A (en) A kind of object detection method based on batch again YOLO algorithm of standardization processing
CN108831530A (en) Vegetable nutrient calculation method based on convolutional neural networks
Fan et al. A novel sonar target detection and classification algorithm
Moate et al. Vehicle detection in infrared imagery using neural networks with synthetic training data
Shi et al. Underwater dense targets detection and classification based on YOLOv3
Li et al. An outstanding adaptive multi-feature fusion YOLOv3 algorithm for the small target detection in remote sensing images
CN116882486B (en) Method, device and equipment for constructing migration learning weight
CN109271833A (en) Target identification method, device and electronic equipment based on the sparse self-encoding encoder of stack
Hazra et al. Handwritten English character recognition using logistic regression and neural network
Rishita et al. Dog breed classifier using convolutional neural networks
CN106951888B (en) Relative coordinate constraint method and positioning method of human face characteristic point
Nacir et al. YOLO V5 for traffic sign recognition and detection using transfer learning
US20210303757A1 (en) Semiconductor fabrication process parameter determination using a generative adversarial network
Shishkin et al. Implementation of yolov5 for detection and classification of microplastics and microorganisms in marine environment
CN114821098A (en) High-speed pavement damage detection algorithm based on gray gradient fusion characteristics and CNN
Mahara et al. Integrating location information as geohash codes in convolutional neural network-based satellite image classification

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry