AU2019101224A4 - Method of Human detection research and implement based on deep learning - Google Patents
Method of Human detection research and implement based on deep learning Download PDFInfo
- Publication number
- AU2019101224A4 AU2019101224A4 AU2019101224A AU2019101224A AU2019101224A4 AU 2019101224 A4 AU2019101224 A4 AU 2019101224A4 AU 2019101224 A AU2019101224 A AU 2019101224A AU 2019101224 A AU2019101224 A AU 2019101224A AU 2019101224 A4 AU2019101224 A4 AU 2019101224A4
- Authority
- AU
- Australia
- Prior art keywords
- convolutional
- residual
- data
- training
- photographs
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Image Analysis (AREA)
Abstract
This image invention lies on the field of deep learning and also improved by YoloV3. We can identify detect the pedestrian from the pictures through the follow steps: first, we acquire generous photographs from some websites, so that we have enough pictures to make our invention get plenty of learning opportunities to get a superior data. Then, we classify our photographs for training and testing after our selecting and pre-processing, using the photographs of training to make our invention have a practice of distinguish those pedestrians. Then, we download the YoloV3 from github, try to use it to train our data. By adjusting the parameter in our algorithm and also improved some structure of it, the accuracy of our invention in human detection can be improved a lot. Finally, we just put the test data in our invention, which can distinguish detect the photographs if there are pedestrians without human involvement. Mi111~ .20 20004 00100013D 04 32 111e0420 Con02D Bc L 1248 Co 03 CS02.D R Wida Slc2 20 Mc W 3"e 52 .2.2.(ntft 5245 75in Figure 1 Type Fillers Size Output Convolutional 32 3x3 256 x 256 Convolutional 64 3x3/ 2 128Nx 128 Convolutional Convolutional 32 1 2 1 xConvolutional 64 3 x C M I.kRe Residual 128 x 128 Layer Layer Layer Convolutional 128 3x 3/2 64 x 64 Convolutional 64 1 x 1 2xConvolutional 128 3 x 3 Residual 64 x 64 Convolutional 256 3x3 /2 32 x32 Convolutional 128 x 1 8x Convolutional 256 3x3 Residual Residual 32 x 32 Convolutional 512 3x3 / 2 16 x16 A dd Convolutional 256 1x1 8x Convolutional 512 3>x3 Residual 16 x 16 Convolutional Convolutional Convolutional 1024 3x3/2 8x8 -(xi) (3x3) Convolutional 512 1 x 1 4x Convolutional 1024 3x3 Residual 8x8 Avgpool Global Connected 1000 Softmax Figure 2
Description
TITLE
Method of Human detection research and implement based on deep learning
FIELD OF THE INVENTION
This image invention lies on the field of deep learning and also yolov3
BACKGROUND
Recently,with the rapid development of artificial intelligence, the requirements of machine automation have gradually improved, human detection have entered our daily life, our smart-phones recognition our face and unlock the phones. Even in the public security system, they also use it for distinguish the guilty. With the widely applied, the accuracy of face recognition also need be augmented. It is important in not only efficiency of operations but also make our safe have a superior promise. Traditional face recognition have a larger error in accuracy rate, improved by Yolo_V3, we control the parameter of artificial neural networks, which can increase their accuracy without human involvement. This will be conducive to its application in future life. Face authenticate can replace identity authentication by face recognition,which can play a greater role in many places, such as airports. It reduces the workload of staff, facilitates management, and improves the efficiency of access. Therefore, this face recognition technology will undoubtedly have more extensive
2019101224 05 Oct 2019 use space in the future.
Our invention is based on the Yolo_V3 algorithm to improve the deep learning convolution neural network. Our invention improves the performance to a certain extent and improves the accuracy by continuously optimizing the values of each parameter in the training process.
SUMMARY
In order to improve the accuracy and efficiency of image recognition, and change the error of existing neural network algorithm to some extent, we use modified Yolo_V3 algorithm to improve human detection. By adjusting the parameters, the accuracy of image recognition can be improved. Lift, so that it can make more accurate classification. Fully exploiting the advantages of automatic feature extraction in deep learning, we can judge whether there is a face in the graph and extract it from the face features, which will make it have a wide application prospect in deep learning. In order to build the image database, we will mark and convert the images from the Internet, and divide the images into training sets and test sets. Put it into the Yolo_V3 convolution neural network as shown in the figure, and we can see the network structure of Yolo_V3 intuitively by changing the graph. Among them, the processing steps of Yolo_V3 can be seen as consisting of three basic components through a series of combinations. First, CBL, which is composed of convolution, BN and
2019101224 05 Oct 2019
Leaky relu, is the smallest component of Yolo_V3. resN, N stands for numbers. This component draws on residual structure of ResNet to make the network deeper. Its basic component is also CBL. Concat, a tensor splicing component, splices the upper samples of the middle layer and some of the later layers. This component expands the dimensions of tensors rather than adding them directly. In the whole structure of Yolo_V3, there is no pooling layer and full connection layer. In the process of forward propagation, the size transformation of tensor is realized by changing the step size of convolution sum. In the whole network structure, every time the size transformation of tensor is made, the side length will be reduced by 1/2. We need to advance in the whole network. Five rows of this process, because of this, the network structure will reduce the image to 1/32 of the input, so in order to ensure the output of the feature map, we need to adjust the edge length of the image to a multiple of 32, usually 416*416. In order to improve the efficiency of training, we put data sets into the network in batches to reduce the loss function. Yolo_V3 is a multi-scale training network structure, we can choose between speed and accuracy according to our needs, which is also a manifestation of the flexibility of Yolo_V3. In order to improve its accuracy, we will make certain adjustments to its selection, by adjusting its input and parameters, so that it can discard some of us. The unnecessary samples are recognized and the results are given according to
2019101224 05 Oct 2019 our expectations.
DESCRIPTION OF DRAWING
Figure 1 shows the structure of Yolo_v3 and it’s basic components Figure 2 shows changing the step size of convolution sum Figure 3 shows prediction of Target Boundary Frame.
DESCRIPTION OF PREFERRED EMBODIMENT
Network Design
Firstly, after convolution calculation, a batch of standardization is made in BN layer to determine the training direction of most pictures and discard those data which are far away from the regression line, so that the trained pictures have more uniform features. Then, the data is activated by relu layer, and the deep neural network is divided into several parts. A shallow network subsegment is trained with short cut to control the propagation of gradient and prevent gradient dispersion or even gradient explosion. The other two basic components, Res_unit and Resblock_body, are also composed of several CBLs. In Resblock_body, we abbreviate them as resX according to X res_units. The zero padding step in resX can expand the feature graph which we gradually reduce to zero, and by extending its edge length to zero, we can make a sudden change. Characterization of the image.
Output of YOLO_V3----predictions across scales
What is predictions across scales ?
2019101224 05 Oct 2019
It uses feature pyramid networks for reference, and also multi-scale to detect different size targets. The finer the grid cell, the finer the object can be detected.
The depth of Y 3 is 255, and the rule of edge length is 13:26:52. For COCO categories, there are 80 categories, so each box should output a probability for each category.
Yolo_V3 sets three boxes for each grid cell, so each box needs five basic parameters (x, y, w, h, confidence), and then 80 categories of probability So 3* (5 + 80) = 255. That's how this 255 came about.
Yolo_V3 implements this multi-scale feature map by means of up-sampling. It can be seen from the above structure chart that the two tensors connected by concat in the graph have the same scale (the two joints are 26 x 26 scale and 52 x 52 scale respectively). The tensor scale of concat joints is the same by (2,2) up-sampling.
Each anchor prior (named anchor prior, but not anchor mechanism) consists of two numbers, one representing height and the other representing width.
Yolo_V3 uses logistic regression to predict b-box. This wave operates like a linear regression adjustment b-box in RPN. Each time V3 predicts b-box, the output is (tx, ty, tw, th, to). Then the absolute (x, y, w, h, c) is calculated by formula 1.
Eogistic regression is used to score an objectivity score on the part
2019101224 05 Oct 2019 surrounded by anchor, that is, how likely the location is to be the target. This step is done before prediction, which can remove unnecessary anchor and reduce the amount of calculation.
If the template box is not optimal, even if it exceeds the threshold we set, we will not predict it. Unlike faster R-CNN, Yolo_V3 only operates on one prior, which is the best prior. Logistic regression is used to find the highest objectness score from nine anchor priors. Logistic regression is a linear model of the mapping relationship between prior and objectivity score by using curves.
Prediction of Target Boundary Frame
Yolo_V3 network makes convolution prediction through (4+1+c) K convolution kernels of 11 sizes in three feature graphs, K is the number of bounding box prior (K defaults to 3), C is the number of categories of predicted targets, of which 4 K parameters are responsible for predicting the offset of the target boundary box, and K parameters are responsible for predicting the target boundary. The box contains the probability of the target, and C, K parameters are responsible for predicting the probability that the K preset boundaries correspond to C target categories. The dotted rectangular frame in the figure is the preset boundary box, and the real rectangular frame is the predicted boundary box calculated by the offset predicted by the network. For the center coordinate of the preset boundary box on the feature map, the width and height of the preset
2019101224 05 Oct 2019 boundary box on the feature map, the offset of the center of the boundary box predicted by the network and the ratio of the width to height are calculated respectively. For the final predicted target boundary box, the transformation process from the preset boundary box to the final predicted boundary box is shown in the formula on the right side of the figure. The sigmoid function is designed to reduce the predicted offset to between 0 and 1, so that the central coordinates of the preset boundary frame can be fixed in a cell. The author says that this can accelerate the convergence of the network.
The following figure shows the size of the feature map of three prediction layers and the size of the preset boundary box on each feature map.
Calculation of Loss Function(formula 1)
The loss function of Yolo_V3 is mainly divided into three parts: the loss of target location offset, the loss of target confidence and the loss of target classification, among which the balance coefficient is the loss.
Δ(Ο,ο, C, c,l,g) = + /Lclci(O, C) + g) (1)
Target Confidence Loss(formula 2)
Target confidence can be understood as predicting the probability of the existence of the target in the rectangular box. Binary Cross Entropy is used for the loss of the target confidence, which denotes the existence of the target in the predicted target boundary box i, 0 denotes the
2019101224 05 Oct 2019 non-existence of the target, and 1 denotes the existence of the target. Represents the “Sigmoid” probability of predicting whether there is a target in the rectangular box I of the target.
c) = -Σ G ln(c,) + (1 - o,) ln(l - c,)) =Signioid(ci)
Target Category Loss(formula 3)
The target category loss also uses the binary cross-entropy loss (using the binary cross-entropy loss, which indicates whether the J-type target really exists in the prediction target boundary box i, 0 means nonexistence, 1 means existence). “Sigmoid” Probability Representing the Target Boundary Box I of Network Prediction in Class J Targets N,(O,C) = - Σ Σ ln<C,) + (1-0,) ln(l- ς)) iePn.s Ji=eta ( 3 )
Λ = Sigmoid(CtJ) Target
Location Loss(formula 4)
The square sum of the difference between the real deviation value and the predicted deviation value is used to represent the predicted rectangular frame coordinate offset, the coordinate offset between the matched GTbox and the default box, the predicted target rectangular box parameter, the default rectangular box parameter and the matched real target rectangular box parameter. These parameters are mapped on the
2019101224 05 Oct 2019 prediction feature map.
a„.(U)=L Σ vr-SD iefws
Procedure
Step 1: Data Acquisition
In this project, we used MS COCO, a public data set in the field of target detection. This data set is built by Microsoft, which contains detection, segmentation, key-points and other tasks. Mainly to solve the problem of the following three scene: detecting non - iconic views of objects (often say detection), contextual reasoning between objects and the precise localization of 2 d objects (corresponding to often say the segmentation problem). The average COCO data set contains 3.5 categories and 7.7 instance targets per picture, which is not only a large amount of data, but also a large number of types and instances.
Step2: Date Pre-processing
Then, we pre-process the data set. We make the features of data and
2019101224 05 Oct 2019 also label, then we converted the XML format labels of the data set to TXT labels to accommodate the Yolo_V3 network. This project is pedestrian detection, so we define the category of pictures as human. Finally, we divided the data set into INRIA_train for training data sets and INRIA_test for test data sets for training and testing.
Step3: Training and optimization
In the optimization of the parameter set, we put the data into the network for training in batches, in order to reduce the computation and improve the training effect.
Besides, we use Transfer Learning to improve our structure of Yolo_V3. Transfer learning uses pre-training model as checkpoint to train and generate neural network model to support new tasks. Transfer learning can transfer common characteristic data and information so as to avoid re-learning these knowledge and realize fast learning.
Step4: Testing
We adjust parameters of the network constantly in order to reach the optimal performance. Adjust the class to 1 to sure the recognition rate can as accurate as possibly. Then we put the test set into the network and get the recognition accuracy.
The data and Recognition rate are as follow:
Table l:The data before Transfer Learning:
Class | Images | Targets | Recognition rate |
2019101224 05 Oct 2019
all | 288 | 597 | 0.892 |
all | 288 | 597 | 0.909 |
all | 288 | 597 | 0.917 |
Table 2: The data before Transfer Learning:
Class | Images | Targets | Recognition rate |
all | 288 | 597 | 0.925 |
all | 288 | 597 | 0.928 |
all | 288 | 597 | 0.93 |
We can observe clearly that the Recognition rate is about 90 percents before transfer learning and after the transfer learning, the rate of recognition can be improved to about 93 percents.This shows that transfer learning is a good method for training convolutional neural networks.
Claims (3)
- CLAIM1. Method of Human detection research and implement based on deep learning, wherein said method has fully trained the data, repeated random selection training from the original data, and randomly selected the final test results, so the results are true and reliable.
- 2. According to method of claim 1, wherein avoid over-fitting, every DBL element in yolov3 network structure has BN layer to regularize the data, which ensures the robustness of the network structure; the accuracy of recognition is also improved by parameter adjustment and transfer learning.
- 3. According to method of claim 1, wherein in the process of training, train the repeated data repeatedly to prevent accidental situations, the results should be reliable.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2019101224A AU2019101224A4 (en) | 2019-10-05 | 2019-10-05 | Method of Human detection research and implement based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2019101224A AU2019101224A4 (en) | 2019-10-05 | 2019-10-05 | Method of Human detection research and implement based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2019101224A4 true AU2019101224A4 (en) | 2020-01-16 |
Family
ID=69146725
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2019101224A Ceased AU2019101224A4 (en) | 2019-10-05 | 2019-10-05 | Method of Human detection research and implement based on deep learning |
Country Status (1)
Country | Link |
---|---|
AU (1) | AU2019101224A4 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112633174A (en) * | 2020-12-23 | 2021-04-09 | 电子科技大学 | Improved YOLOv4 high-dome-based fire detection method and storage medium |
CN113052184A (en) * | 2021-03-12 | 2021-06-29 | 电子科技大学 | Target detection method based on two-stage local feature alignment |
CN113077406A (en) * | 2020-11-25 | 2021-07-06 | 无锡乐骐科技有限公司 | Convolution filling method based on optimization |
CN113408321A (en) * | 2020-03-16 | 2021-09-17 | 中国人民解放军战略支援部队信息工程大学 | Real-time target detection method and device for lightweight image and video data |
CN113469321A (en) * | 2020-03-30 | 2021-10-01 | 聚晶半导体股份有限公司 | Object detection device and object detection method based on neural network |
US11495015B2 (en) | 2020-03-30 | 2022-11-08 | Altek Semiconductor Corp. | Object detection device and object detection method based on neural network |
CN117036234A (en) * | 2023-05-09 | 2023-11-10 | 中国铁路广州局集团有限公司 | Mixed steel rail ultrasonic B-display map damage identification method, system and storage medium |
-
2019
- 2019-10-05 AU AU2019101224A patent/AU2019101224A4/en not_active Ceased
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408321A (en) * | 2020-03-16 | 2021-09-17 | 中国人民解放军战略支援部队信息工程大学 | Real-time target detection method and device for lightweight image and video data |
CN113408321B (en) * | 2020-03-16 | 2023-08-22 | 中国人民解放军战略支援部队信息工程大学 | Real-time target detection method and device for lightweight image and video data |
CN113469321A (en) * | 2020-03-30 | 2021-10-01 | 聚晶半导体股份有限公司 | Object detection device and object detection method based on neural network |
US11495015B2 (en) | 2020-03-30 | 2022-11-08 | Altek Semiconductor Corp. | Object detection device and object detection method based on neural network |
CN113077406A (en) * | 2020-11-25 | 2021-07-06 | 无锡乐骐科技有限公司 | Convolution filling method based on optimization |
CN113077406B (en) * | 2020-11-25 | 2022-06-14 | 无锡乐骐科技股份有限公司 | Image convolution filling method based on optimization |
CN112633174A (en) * | 2020-12-23 | 2021-04-09 | 电子科技大学 | Improved YOLOv4 high-dome-based fire detection method and storage medium |
CN112633174B (en) * | 2020-12-23 | 2022-08-02 | 电子科技大学 | Improved YOLOv4 high-dome-based fire detection method and storage medium |
CN113052184A (en) * | 2021-03-12 | 2021-06-29 | 电子科技大学 | Target detection method based on two-stage local feature alignment |
CN117036234A (en) * | 2023-05-09 | 2023-11-10 | 中国铁路广州局集团有限公司 | Mixed steel rail ultrasonic B-display map damage identification method, system and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2019101224A4 (en) | Method of Human detection research and implement based on deep learning | |
US10586103B2 (en) | Topographic data machine learning method and system | |
Nie et al. | Pavement distress detection based on transfer learning | |
NL2025689B1 (en) | Crop pest detection method based on f-ssd-iv3 | |
CN108830188A (en) | Vehicle checking method based on deep learning | |
Aditya et al. | Batik classification using neural network with gray level co-occurence matrix and statistical color feature extraction | |
US11468266B2 (en) | Target identification in large image data | |
CN110543906B (en) | Automatic skin recognition method based on Mask R-CNN model | |
CN104462494A (en) | Remote sensing image retrieval method and system based on non-supervision characteristic learning | |
CN110334594A (en) | A kind of object detection method based on batch again YOLO algorithm of standardization processing | |
CN108831530A (en) | Vegetable nutrient calculation method based on convolutional neural networks | |
Fan et al. | A novel sonar target detection and classification algorithm | |
Moate et al. | Vehicle detection in infrared imagery using neural networks with synthetic training data | |
Shi et al. | Underwater dense targets detection and classification based on YOLOv3 | |
Li et al. | An outstanding adaptive multi-feature fusion YOLOv3 algorithm for the small target detection in remote sensing images | |
CN116882486B (en) | Method, device and equipment for constructing migration learning weight | |
CN109271833A (en) | Target identification method, device and electronic equipment based on the sparse self-encoding encoder of stack | |
Hazra et al. | Handwritten English character recognition using logistic regression and neural network | |
Rishita et al. | Dog breed classifier using convolutional neural networks | |
CN106951888B (en) | Relative coordinate constraint method and positioning method of human face characteristic point | |
Nacir et al. | YOLO V5 for traffic sign recognition and detection using transfer learning | |
US20210303757A1 (en) | Semiconductor fabrication process parameter determination using a generative adversarial network | |
Shishkin et al. | Implementation of yolov5 for detection and classification of microplastics and microorganisms in marine environment | |
CN114821098A (en) | High-speed pavement damage detection algorithm based on gray gradient fusion characteristics and CNN | |
Mahara et al. | Integrating location information as geohash codes in convolutional neural network-based satellite image classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FGI | Letters patent sealed or granted (innovation patent) | ||
MK22 | Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry |